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FOREWORD 


The  theme  of  the  Thirteenth  Conference  on  the  Design  of  Experiments 
in  Army  Research,  Development  and  Testing,  as  suggested  by  Dr.  Walter  Foster, 
was  "Design  and  Analysis  for  Engineering  Experimentation".  This  was  a  very 
appropriate  theme  in  the  light  of  the  recent  activities  of  the  two  hosts  for 
the  meeting.  This  conference  was  held  at  Fort  Belvoir,  Virginia,  on  1-3 
November  1967,  and  the  U.S.  Army  Mobility  Equipment  Research  and  Development 
Center  served  together  with  the  U.S.  Army  Engineer  Topographic  Laboratories 
as  joint  hosts.  The  Army  Mathematics  Steering  Committee,  sponsors  of  these 
meetings  on  behalf  of  the  Office  of  the  Chief  of  Research  and  Development, 
would  like  to  thank  these  two  agencies  for  so  ably  serving  the  conference  in 
this  capacity.  A  large  number  of  persons  at  Fort  Belvoir  helped  with  the 
various  details  needed  to  run  a  meeting  of  this  size.  We  would  like  to 
express  the  thanks  of  the  attendees  for  the  many  courtesies  shown  them.  In 
particular,  their  thanks  are  due  to  Mr.  James  B.  Duff,  Chairman  on  Local 
Arrangements,  for  his  excellent  execution  of  the  many  details  needed  to 
make  the  symposium  run  smoothly. 

The  invited  speakers  for  the  conference  featured  five  nationally 
known  scientists.  Their  names  and  the  titles  of  their  addresses  are  noted 
below: 


Regression  Analysis 

Professor  Francis  J.  Anscombe,  Yale  University 

Some  Comments  on  Matching 

Professor  K.A.  Brownlee,  University  of  Chicago 

Some  Statistical  Methods  in  Machine  Intelligence  Research 
Professor  I.J.  Good,  Virginia  Polytechnic  Institute 

Maximum  Likelihood  Estimation  of  Reliability 
Dr.  Frank  Proschan,  Boeing  Company 

Data  Analysis 

Dr.  M.B.  Wilk,  Bell  Telephone  Laboratories 

In  addition  to  these  talks,  there  were  29  contributed  papers  which  covered 
a  wide  range  of  design  and  statistical  problems.  Following  the  banquet, 
which  was  held  at  the  Officers'  Club,  it  was  my  pleasure  to  present  the 
Third  Wilks  Memorial  Medal  to  Professor  William  G.  Cochran  of  Harvard 
University.  We  are  pleased  to  be  able  to  include  in  these  Proceedings 
Dr.  Cochran's  acceptance  speech. 

This  volume  of  the  Proceedings  contains  26  of  the  papers  which  were 
presented  at  this  meeting.  The  Army  Mathematics  Steering  Committee  has 
asked  that  these  articles  on  modern  principles  on  the  design  of  experiments, 
together  with  the  application  of  these  ideas,  be  made  available  in  the  form 
of  this  technical  manual.  Members  of  this  committee  take  this  opportunity 
to  express  their  thanks  to  the  many  speakers  and  other  research  workers  who 
participated  in  the  conference. 


The  conference  had  an  attendance  of  173  scientists;  and  71  organizations 
were  represented.  Speakers  and  panelists  came  from  Yale  University,  University 
of  Chicago,  North  Carolina  State  University  at  Raleigh,  the  National  Institutes 
of  Health,  Harvard  Computing  Center,  University  of  North  Carolina  at  Chapel  Hill, 
University  of  Georgia,  Cornell  University,  University  of  Wisconsin,  Boeing 
Scientific  Research  Laboratories,  Stanford  University,  Duke  University,  Virginia 
Polytechnic  Institute,  Stanford  Research  Institute,  the  National  Aeronautics 
and  Space  Administration,  Bell  Telephone  Laboratories,  and  fourteen  Army 
facilities. 

Lieutenant  Colonel  John  H.  Cain,  Deputy  Commander  of  the  U.S.  Army  Mobility 
Equipment  Research  and  Development  Center,  and  Lieutenant  Colonel  William  R. 
Cordova,  Deputy  Commander  of  the  U.S.  Army  Engineer  Topographic  Laboratories, 
both  welcomed  members  of  the  conference  to  Fort  Belvoir.  Their  comments  to 
those  in  attendance  contained  many  interesting  statements  about  the  work  being 
performed  by  the  host  installation.  Their  remarks  are  printed  here  for  the 
benefit  of  those  who  did  not  have  an  opportunity  to  attend  the  symposium. 

The  Chairman  wishes  to  take  this  opportunity  to  thank  members  of  his 
Advisory  Committee  (Cuthbert  Daniel,  F.G.  Dressel,  Walter  D.  Foster,  Fred 
Frlshman,  Lawrence  Gamblno,  Bernard  Greenberg,  Bernard  Harris,  Boyd  HarBh- 
barger,  J.S.  Hunter,  William  Kruskal,  H.L.  Lucas,  Jr.,  Clifford  Maloney,  and 
Frank  Robertson)  for  their  assistance  in  formulating  the  agenda  and  their 
help  In  selecting  the  invited  speakers. 


Frank  E.  Grubbs 
Conference  Chairman 


UFT.PQWP  BTOltrc 

LTC  John  H.  Cain 


Good  morning,  ladies  and  gentlemen ! 

I  am  Colonel  Cain,  deputy  commander  of  the  U.S.  Army  Mobility  Equipment 
Research  and  Development  Canter,  your  co-hoat  for  this  three-day  meeting. 
Speaking  on  behalf  of  Colonel  0 'Donne1 1,  the  Center's  commanding  officer,  I 
am  happy  to  welcome  you  here  today. 

1  want  to  thank  the  Army  Mathematics  Steering  Committee  for  sponsoring 
its  13th  annual  Conference  on  the  Design  of  Experiments  at  the  R&D  Center. 

It  gives  so  many  more  of  our  people  an  opportunity  to  become  acquainted  with 
the  latest  in  statistical  and  mathematical  methods  for  application  in  their 
scientific  and  engineering  work. 

Since  the  R&D  Canter  is  both  co-host  and  participant  I  would  like  to 
take  a  few  minutes  to  acquaint  you  briefly  with  its  mission  and  facilities. 

The  Center  [see  the  first  of  the  following  figures]  was  for  20  years, 
until  two  months  ago  today  to  be  exact,  the  Engineer  Research  and  Development 
Laboratories.  The  change  in  name,  however,  in  no  way  changed  its  location 
in  the  Defense  chain  shown  here.  Now,  as  then,  the  Center  is  THE  R&D  agency 
of  the  Mobility  'Equipment  Command  in  St.  Louis,  a  major  sub-command  of  the 
Army  Materiel  Command. 

Our  mission  [second  figure]  remains  the  same,  and  in  each  aspect,  from 
research  thru  engineering  for  procurement,  the  goal  remains  the  ultimate  in 
mobility  equipment  for  the  Army. 

To  achieve  this  goal  [third  figure]  the  Center  engages  in  R&D  in  some 
13  areas.  You  can  see  from  the  diversity  of  these  fields  of  endeavor,  that 
a  wide  range  is  offered  for  design  of  experiments. 

Our  organisation,  as  shown  here  [fourth  figure]  features  four  R&D 
Laboratories:  Military,  Electro,  and  Mechanical  Technology,  and  Intrusion 
Detection  and  Sensor.  The  Engineering  Laboratory  prepares  technical  data 
packages  which  give  industry  the  specifications,  drawings  and  other  informa¬ 
tion  it  needs  to  build  quality  mobility  equipment  in  quantity. 

Some  1400  scientific,  engineering  and  support  personnel  are  employed 
at  the  Center  [last  figure] .  The  main  physical  plant  is  just  down  the  road 
a  piece.  Approximately  30  permanent  structures  on  a  240-acre  site  house  of 
the  best  R&D  facilities  in  the  country.  Additional  test  facilities  are 
supplied  at  the  900  acre  annex  on  Belvoir's  North  Area. 

The  Center,  you  will  note  has  several  tenants.  0ns  of  these,  the 
U.S.  Army  Engineer  Topographic  Laboratories,  is  co-host  for  this  conference. 

To  give  ETL  a  chance  to  add  its  welcome,  I  will  close  now  with  best 
wishes  to  Chairman  Dr.  Frank  E.  Grubbs  and  to  all  of  you  for  a  lucky  13th. 
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WELCOME  REMARKS 
LIC  wiiiioa  K.  Cordova 


Thank  you  Colonel  Cain. 

On  behalf  of  Colonel  Anderson,  the  Comanding  Officer  of  Che  U.S.  Army 
Engineer  Topographic  Laboratories,  1  take  pleasure  In  welcoming  you  this 
morning . 

I  might  begin  by  saying  that  we  too  have  had  a  recent  name  change,  prior 
to  28  July  1967  being  known  aa  the  U.S.  Army  Engineer,  Geodesy,  Intelligence 
and  Mapping  Research  and  Development  Agency,  the  acronym  being  GIMRADA. 

The  mission  of  USAETL  la  as  follows: 

The  U.S.  Amy  Engineer  Topographic  Laboratories  (ETL)  is  a  Class  II 
activity  under  the  Chief  of  Engineers.  It  la  the  principal  field  activity 
of  the  Corps  of  Engineers  for  the  accomplishment  of  research  and  development 
of  equipment,  procedures  and  techniques  in  the  specific  field  of  geodesy, 
military  geography  and  mapping  for  application  both  to  troop  and  to  base 
plant  operations.  The  Chief  of  Engineers  may  assign  work  to  these  Laboratories 
under  research  and  development  projects  utilising  either  RDT&E  funds  or  other 
appropriate  funds. 

Our  research  and  development  program  in  Mapping  and  Geodesy  includes 
activities  within  the  entire  spectrum  from  basic  research  through  exploratory 
development,  advanced  systems  development  and  finally  engineering  development, 
where  a  particular  system  or  item  is  engineered  for  production  and  service  use. 

Our  primary  goals  are  as  follows: 

a.  Develop  the  capability  to  provide  current  and  adequate  "Terrain 
Data"  when  and  where  needed  for  military  purposes. 

b.  Minimise  the  geodesy  and  gravity  portion  of  the  error  budget 
of  weapons  and  missiles  systems. 

c.  Maintain  superiority  in  technology  to  be  able  to  project  the 
state  of  art  and  to  provide  meaningful  forecasts  to  customers. 

The  USAETL  organisation  is  comprised  of  two  major  technical  operating 
elements : 


a.  The  Research  Institute,  which  conducts  basic  and  applied  research 
and  individually  orlentad  exploratory  development  involving  the 
disciplines  related  to  mapping  and  geodetic  sciences,  is  located  in 
GSA  rental  space  in  Alexandria,  Virginia.  I  believe  a  number  of  you 
know  Mr.  Larry  Gamblno  of  the  Research  Institute  who  will  be  presenting 


a  paper  at  this  session. 


b.  Our  Mapping  and  Geographic  Sciences  Laboratory,  which  conducts 
feasibility  studies,  design,  development  and  tests  and  evaluation 
of  systems,  equipment  and  techniques  in  the  specific  fields  of 
mapping,  geodesy  and  geographic  sciences,  is  located  within  the 
MERDC  area  along  with  the  Headquarters  and  support  offices.  We 
currently  have  28  trailers.  The  Chief  of  Engineers  has  approved 
a  site  on  the  North  Fort  Belvoir  Post  for  our  new  building  which 
we  hope  to  get  approved  in  the  FT  69  budget. 

I  hope  this  very  brief  presentation  has  given  you  a  general  feel  for 
our  mission,  goals  and  work.  If  I  can  be  of  assistance  to  any  of  you,  please 
stop  by  my  office. 

Thank  you  very  much.  I  hope  you  have  an  enjoyable  and  fruitful  conference. 
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Proving  Ground,  Maryland 

1500-1530  BREAK 

1530-170 0  TECHNICAL  SESSION  III  -  Auditorium 

Chairman:  Gideon  A.  Culpepper,  Missile  Test  and  Evaluation 
Control  Division,  White  Sands  Missile  Range,  New  Mexico 

ON  EXPECTED  PROBABILITIES  OF  MISCLASSIFICATION  IN  DISCRIMINANT 

ANALYSIS 

P.A.  Lachenbruch,  School  of  Public  Health,  Department  of 
Blostatlstics,  University  of  North  Carolina,  Chapel  Hill, 
North  Carolina 

INTRA-PROFILE  VARIANCE 

Claude  F.  Bridges,  Institutional  Rssearch  Division,  Office 
of  Research,  U.S>  Military  Academy,  West  Point,  N.Y. 

1530-1700  TECHNICAL  SESSION  IV  -  Room  2E 

Chairman:  Henry  Ellnar,  Quality  Aasuranca  Directorate, 

U.S.  Army  Materiel  Command,  Washington,  D.C. 

A  STATISTICAL  TEST  OF  TWO  HYPOTHETICAL  RELIABILITY  GROWTH  CURVES 

OF  THE  LOGISTIC  FORM  IN  THE  DISCRETE  CASE 

William  ?.  Henke,  Research  Analyeie  Corporation,  McLean, 
Virginia 

ON  FITTING  OF  THE  WEI BULL  DISTRIBUTION  WITH  NON-ZERO  LOCATION 

PARAMETERS  AND  SOME  APPLICATIONS 

Oekar  M.  Esaenwanger,  Physical  Sciences  Laboratory,  Research 
and  Development  Division,  Redstone  Arsenal,  Alabama 

1730-1830  SOCIAL  HOUR  -  Mackentie  Hall  (Officer’s  Club)* 

1830-  Banquet  -  (As  above) 

Presentation  of  the  Samuel  S.  Wilks  Memorial  Award 


*Attendaes  will  not  be  able  to  return  to  motels  unless  they  have  their  own 
transportation. 
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Thursday.  2  November 


0830-1000 


0830-1000 


0830-1000 


CLINICAL  SESSION  B  -  Auditorium 

Chairman:  A.C.  Cohen,  Department  of  Statistics,  University 
of  Georgia,  Athens,  Georgia 

Panelists : 

Robert  Bechhofer,  Cornell  University 

Cuthbert  Daniel,  Private  Consultant 

Bernard  Harris,  Mathematics  Research  Center,  U.S.  Army 

Henry  Mann,  Mathematics  Research  Center,  U.S.  Army 

Frank  Proschan,  Boeing  Scientific  Research  Laboratories 

Herbert  Solomon,  Stanford  University 

DETERMINATION  OF  TB0  BY  WEI BULL  PROBABILITY  PARAMETERS  FOR 

REPAIRABLE  COMPONENTS 

John  L.  Mundy,  U.S.  Army  Aviation  Materiel  Command,  St.  Louis, 
Missouri 

TECHNICAL  SESSION  V  -  Room  2E 

Chairman:  Raymond  Schnell,  U.S.  Army  Chemical  Corps,  Edgewood 
Arsenal,  Maryland 

A  TECHNIQUE  FOR  INTERPRETING  HIGH  ORDER  INTERACTIONS 

Melvin  0.  Braaten  and  John  Tonzetich,  Duke  University, 
Representing  Shaw  Air  Forca  Base,  South  Carolina,  and  the 
North  Carolina  Operations  Analysis  Standby  Unit,  University 
of  North  Carolina,  Chapel  Hill,  North  Carolina 

A  SIMPLIFIED  METHOD  FOR  FINDING  OPTIMUM  EXPERIMENTAL  DESIGNS 

Melvin  O.  Braaten,  Duke  University;  Ray  L.  Miller,  Jr.,  Shaw 
Air  Force  Base,  South  Carolina;  Fred  W.  Judge,  Wood-Ivey 
Systems  Corporation,  Winter  Park,  Florida.  Representing 
Shaw  Air  Forca  Base,  S.C.,  and  the  North  Carolina  Operations 
Analysis  Standby  Unit,  University  of  North  Carolina,  Chapel 
Hill,  North  Carolina 

TECHNICAL  SESSION  VI  -  Room  2F 

Chairman:  Erwin  Blear,  U.S.  Army  Electronics  Command,  Fort 
Monmouth,  New  Jersey 

DEFINITIVE  CALIBRATION  OF  AN  AERIAL  CAMERA  IN  ITS  OPERATING 

ENVIRONMENT 

Lawrence  A.  Gambino,  U.S.  Army  Topographic  Laboratories, 

Fort  Belvoir,  Virginia 
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Thursday  (Continued) 


DESIGN  AND  ANALYSIS  OF  A  STATISTICAL  EXPERIMENT  ON  HIGH  vm.T*n» 
BREAKDOWN  IN  VA CuoH 


M.M,  Chrepta,  G.W.  Taylor,  and  M.H.  Zinn,  U.S.  Army  Electronics 
Command,  Fort  Monmouth,  New  Jersey 


1000-1030  BREAK 

1030-1130  TECHNICAL  SESSION  VII  -  Auditorium 

Chairman:  Henry  Dihm,  Advanced  Systems  Laboratory,  Directorate 
of  Research  and  Development,  U.S.  Army  Missile  Command, 

Redstone  Arsenal,  Alabama 

A  MODERATELY  DISTRIBUTION  FREE  TECHNIQUE  FOR  SMALL  SAMPLE  RELIABILITY 

ESTIMATION 

Michael  G.  Billings,  U.S.  Army  Chemical  Corps,  Dugvay  Proving 
Ground,  Utah 

1030-1130  TECHNICAL  SESSION  VIII  -  Room  2E 

Chairman:  Agatha  Wolman,  U.S.  Army  Strategy  and  Tactics  Group, 
Bethesda,  Maryland 

USE  OF  REFERENCE  COMPONENT  MIXTURE  DESIGNS  IN  A  CALIBRATION 

APPLICATION 

Raymond  H.  Myers,  Department  of  Statistics,  Virginia  Polytachnic 
Institute,  Blacksburg,  Virginia,  and 

Bernard  J.  Alley,  U.S.  Army  Missile  Command,  Redstone  Arsenal, 
Alabama 

1030-1130  TECHNICAL  SESSION  IX  -  Room  2F 

Chairman;  Joseph  Mandelson,  Quality  Evaluation  Division, 

Quality  Assurance  Directorate,  U.S.  Army  Edgevood  Arsenal, 
Maryland 

DEVELOPMENT  OF  AN  IMPROVED  MODEL  FOR  ACOUSTIC  SOUND  RANGING 

Robert  P.  Lee,  Atmospheric  Sciences  Office,  U.S.  Army  Electronics 
Command,  White  SandB  Missile  Range,  New  Mexico 

AN  EXPERIMENT  ON  THE  METEOROLOGICAL  EFFECTS  ON  SOUND  RANGING 

William  H.  Hatch,  Atmospheric  Sciences  Office,  U.S.  Army 
Electronics  Command,  White  Sands  Missile  Range,  New  Mexico 
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1130-1300 


LUNCH 


Thursday  (Continued) 


1300-1520 


1300-1520 


1300-1520 


CLINICAL  SESSION  C  -  Auditorium 

r>u „ j _  .  Harold  Fsssb'*’"™  Reaonrrh  Anfllvsls  Corporation. 

McLean,  Virginia 

Panelists : 

Robert  Bechhofer,  Cornell  University 

O.P.  Bruno,  U.S.  Army  Ballistic  Research  Laboratories 

A.C.  Cohen,  University  of  Georgia 

Walter  D.  Foster,  U.S.  Army  Biological  Laboratories 

Boyd  Harshbarger,  Virginia  Polytechnic  Institute 

H.L.  Lucas,  Jr.,  North  Carolina  State  University 

Herbert  Solomon,  Stanford  University 

PARAMETERS  IN  R&D  IN  RELATION  TO  COST/ACCURACY  INVESTIGATION 

Robert  G.  Conard,  Systems  Evaluation  Branch,  Advanced  Systems 
Laboratory,  Research  &  Development  Directorate,  U.S.  Army 
Missile  Command,  Redstone  Arsenal,  Alabama 

ON  EXPERIMENTS  CONCERNED  WITH  THE  SAMPLING  DISTRIBUTION  OF 
LANCASTER'S  PARAMETERS 

David  R.  Howes,  U.S.  Army  Strategy  and  Tactics  Analysis  Group, 
Bethesda,  Maryland 

TECHNICAL  SESSION  X  -  Room  2F 

Chairman:  William  W.  Wolman,  Traffic  Systems  Division,  Office 

of  Research  and  Development,  Bureau  of  Public  Roads,  Washington, 
D.C. 

ESTIMATES  OF  P(Y  <  X)  AND  THEIR  APPLICATION  TO  RELIABILITY 
PROBLEMS  FOR  BOTH  CONTINUOUS  AND  QUANTAL  RESPONSE  DATA 

Bernard  Harris  and  J.D.  Church,  Mathematics  Research  Center, 

U.S.  Army,  University  of  Wisconsin,  Madison,  Wisconsin 

NUMBERS  NEEDED  FOR  DETECTING  IMPORTANT  DIFFERENCES  IN  CHI-SQUARE 
TESTS 

C.J.  Maloney,  Division  of  Biologies  Standards,  National  Institutes 
of  Health,  Bethesda,  Maryland,  and  F.M.  Wadley,  Consultant, 

U.S.  Army  Biological  Laboratories,  Fort  Datrick,  Frederick,  Md. 

ON  A  STATISTICALLY  CONSISTENT  ESTIMATE  OF  AN  AVERAGE  CUMULATIVE 
QUANTAL  RESPONSE  FUNCTION 

George  W.  Evans  II,  and  Robert  C.  McCarty,  Stanford  Research 
Institute,  Menlo  Park,  California.  Representing  the  U.S. 

Army  Research  Off ice-Durham 

TECHNICAL  SESSION  XI  -  Room  2E 

See  next  page 
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Thursday  (Continued) 


Chairman:  Joseph  M.  Cameron,  Statistical  Engineering  Laboratory 
national  Bureau  ot  standards,  Gaithersburg,  Maryland 

DESIGNS  OF  EXPERIMENTS  AS  TELESCOPING  SEQUENCES  OF  SLOCKS 

Arthur  G.  Holms,  National  Aeronautics  and  Space  Administration, 
Lewis  Research  Center,  Cleveland,  Ohio 

ON  A  CUSS  OF  NONPARAMETRIC  TESTS  FOR  INTERACTIONS  IN  FACTORIAL 

EXPERIMENTS 

P.K.  Sen,  School  of  Public  Health,  Department  of  Biostatiatica, 
University  of  North  Carolina,  Chapel  Hill,  North  Carolina 

ON  THE  RANK  MOD  p  OF  THE  DESIGN  MATRIX  OF  A  DIFFERENCE  SET 

Jessie  MacWilliams,  Bell  Telephone  Laboratories,  Murray  Hill, 

New  Jersey,  and  Henry  B.  Mann,  Mathematics  Research  Center, 

U.S.  Army,  University  of  Wisconsin,  Madison,  Wisconsin 


1520-1550  BREAK 

1550-1700  GENERAL  SESSION  2  -  Auditorium 

Chairman:  Professor  Boyd  Harshbarger,  Department  of  Statlatlcs, 
Virginia  Polytechnic  Institute,  Blacksburg,  Virginia 

SOME  STATISTICAL  METHODS  IN  MACHINE  INTELLIGENCE  RESEARCH 

Professor  I.J.  Good,  Department  of  Statistics,  Virginia 
Polytechnic  Institute,  Blacksburg,  Virginia 

Friday,  3  November 

0830-0915  GENERAL  SESSION  3  -  Auditorium 

OPEN  MEETING  OF  THE  AMSC  SUBCOMMITTEE  ON  PROBABILITY  AND  STATISTICS 

Presided  over  by:  Dr.  Walter  D.  Faster,  Biometric  Division, 

U.S.  Army  Biological  Laboratories,  Fort  Detrick,  Frederick, 
Maryland 

0925-1200  GENERAL  SESSION  4  -  Auditorium 

Chairman:  Dr.  Frank  E.  Grubbs,  Chairman  of  the  Conference, 
Ballistic  Research  Laboratories,  Aberdeen  Proving  Ground, 
Maryland 

MAXIMUM  LIKELIHOOD  ESTIMATION  OF  RELIABILITY 

Dr.  Frank  Proschan,  Mathematics  Research  Laboratory,  Boeing 
Scientific  Research  Laboratories,  Headquarters,  Offices  the 
Boeing  Company,  Seattle,  Wsahington 
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Friday  (Continued) 


BREAK 

DATA  ANALYSIS  (tentative  title) 

Dr.  M.B.  Wilk,  Statistics  &  Data  Analysis  Research  Department, 
Bell  Telephone  Laboratories,  Murray  Hill,  New  Jersey 


REGRESSION  ANALYSIS  In  1MJ£  COMPUTER  AGE 

F.J.  Anscofflbc 
Department  of  Stetlatice 
Yale  University 
Mew  Haven,  Connecticut 


1.  INTRODUCTION .  The  commonly  uaad  methods  of  statistical  analysis  took 
much  of  their  present-day  form  In  the  period  of  rapid  development  of  atatlstlcal 
science  between  the  two  world  wars.  They  were  conditioned,  more  than  perhaps  la 
generally  realized,  by  the  principal  computing  resource  of  that  period,  the 
desk  calculator.  They  give  just  about  the  best  return  possible  for  the  amount 
of  effort  that  a  human  being  equipped  with  a  desk  calculator  could  reasonably 
(or  even  a  little  unreasonably)  be  expected  to  invest  in  a  statistical  analysis. 

Now  that  our  computing  resources  are  enormously  greater,  we  need  not  content 
ourselves  with  merely  following  the  procedures  suitable  for  the  desk  calculator. 
Almost  anything  we  might  ask  for  can  be  had  at  very  little  cost.  What  can  ws 
make  use  of?  What  sorts  of  calculations  and  output  will  give  us  most  understanding, 
least  misunderstanding? 

Our  extended  computing  powers  can  affect  statistical  methods  in  two  ways. 

First,  we  are  able  to  make  better  use  of  traditional  methods,  or  of  methods 
closely  related  thereto.  Above  all,  we  can  now  afford  to  ask  freely  for  scatter- 
plots.  These  are  tedious  to  construct  by  hand,  but  trivial  with  a  computer.  We 
can  also  demand  the  calculation  of  residuals,  to  test  agreement  of  the  data 
with  assumptions  underlying  the  method  of  analysis.  We  can  afford  to  make 
transformations  of  variables  and  repeat  analyses,  to  see  If  agreement  is  improved. 

Second,  we  can  consider  methods  of  analysis  that  are  radically  different 
from  traditional  methods  and  Involve  much  heavier  computation.  The  great  majority 
of  traditional  statistical  analysis  comes  under  the  heading  of  "least  squares" 

—  regression,  analysis  of  variance,  and  analogous  procedures  like  the  analysis 
of  contingency  tables  by  x2>  The  least  squares  principle  was  originally  advocated 
by  Laplace  and  Gauss  a  century  and  a  half  ago  because  they  thought  no  other  method 
of  combining  observations  would  be  computationally  feasible.  Now  there  are  many 
other  possibilities,  and  these  should  be  explored. 

This  paper  has  the  modest  purpose  of  illustrating  a  few  features  of  statis¬ 
tical  analysis  in  the  computer  age.  A  set  of  gunnery  readings,  to  which 
traditional  regression  analysis  is  applicable,  is  examined. 

Section  2  contains  a  brief  digression  on  computing.  In  section  3  traditional 
regression  methods  are  exemplified  in  thsir  modern  guise.  In  section  4  a  non- 
traditional  analysis  is  briefly  reported. 


This  research  was  supported  by  the  Army,  Navy,  Air  Force  and  NASA  under  a 
contract  administered  by  the  Office  of  Naval  Research.  Reproduction  in  whole 
or  in  part  is  permitted  for  any  purpose  of  the  United  States  Government. 
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Wilk's  paper  [5]  further  exemplifies  the  impact  of  the  computer  on 
statistical  analysis 

2.  STATISTICAL  COMPUTING.  The  computer  has  not  so  far  had  the  profound 
effect  on  statistics  that  it  has  had  on  some  other  fields  of  science  and 
technology..  The  reason  is  perhaps  that  good  statistical  analysis  is  done 
in  stepsT'  Methods  must  be  adjusted  to  fit  the  data;  the  adequacy  of 
theoretical  descriptions  of  "models"  must  be  assessed.  This  requires  inter¬ 
action  between  the  investigator  and  the  computer.  Fixed  program  packages  are 
not  altogether  satisfactory.  \ 

An  explosive  development  of  statistical  science  can  be  expected  once 
programming  can  really  be  done  by  any  interested  person,  without  a  large 
preliminary  investment  of  time  in  mastering  a  computer  language  and  without 
much  time  spent  in  actual  coding.  What  makes  programming  so  tedious  in 
FORTRAN  and  other  commonly  used  languages  is  the  negotiation  of  arrays. 
Arithmetical  operations  are  required,  not  just  on  individual  numbers,  but 
on  whole  vectors  or  matrices;  and  in  these  languages  such  operations  must 
be  spelled  out  in  loops.  Successful  attempts  have  been  made  to  relieve  the 
intolerable  tedium  with  special  computing  systems  for  vectors  and  matrices. 

I  have  had  access  to  an  experimental  implementation  of  Iverson's 
programming  language  known  as  APL  [3,4],  at  IBM's  Thomas  J.  Watson  Research 
Center,  Yorktown  Heights,  N.Y.  APL  is  running  as  a  coding  language  for 
computation  in  conversational  mode  through  typewriter  terminals.  Though 
the  language  was  not  originally  developed  for  statistical  work  (but  rather 
for  the  precise  and  concise  expression  of  any  algorithms) ,  it  is  in  fact 
well  adapted  to  statistical  purposes.  Two  salient  reasons  are; 

(1)  APL  was  designed  at  the  outset  to  handle  (almost  indifferently) 
scalars,  vectors,  matrices  and  rectangular  arrays  in  any  number  of  dimensions. 
All  the  basic  arithmetic  operations  can  be  performed  on  arrays  just  as  well 
as  on  scalars,  without  any  loop  written  in  the  program.  Programs  in  APL 
therefore  tend  to  contain  few  loops.  The  programmer  is  encouraged  to  think 
of  array  operations  as  entities  without  a  logically  irrelevant  internal 
sequence;  this  is  aesthetically  pleasing,  even  illuminating . 

(il)  There  is  a  high  degree  of  consistency  in  APL.  Syntax  is  governed 
ruthlessly  by  a  very  few  simple  rules.  Once  the  basic  vocabulary  is  learned, 
the  language  is  easy  to  remember.  There  is  a  remarkable  absence  of  arbitrary 
features  that  require  frequent  reference  to  the  manual.  The  language  there¬ 
fore  has  a  peculiar  dignity  and  reasonableness.  One  feels  it  is  worth  learning. 

I  have  elsewhere  [2]  prepared  a  description  of  APL,  with  illustrations 
of  its  use  in  statistical  work.  The  above  remarks  are  abstracted  from  that 
article. 

This  implementation  of  APL  as  a  computer  coding  language  is  not  yet 
available  for  general  use.  Something  like  it  must  surely  become  available 
eventually,  hopefully  soon.  I  am  confident  that  it  will  have  a  profound 
influence  on  the  development  of  statistics.  The  computations  mentioned 
below  were  done  through  an  APL  terminal. 


2 


3.  LEAST-SQUARES  REGRESSION.  The  date  uaed  for  this  atudy  of  regression 
methods  were  kindly  supplied  by  Dr.  Frank  E.  Grubbs,  of  the  U.S.  Army 
Ballistic  Research  Laboratories,  Aberdeen  Proving  Ground.  They  relate  to 
some  175  mm.  gun  firings.  In  Table  1  we  see  the  following  information  for 
35  rounds:  Range  (metres),  Projectile  Weight  (lb.),  Mussle  Velocity  (f.p.s.), 
and  four  items  of  weather  information  taken  at  the  maximum  ordinate  of  the 
trajectory,  namely  Temperature  (dag.  C),  Air  Density  (kgm/1000),  Range  Wind 
and  Cross  Wind  (both  in  metres  per  second  divided  by  10).  The  first  24  rounds 
were  fired  on  one  day,  between  13.07  end  15.13  hrs.  The  remaining  11  rounds 
were  fired  the  next  day  between  10.57  end  11.33  hrs. 

Let  us  perform  a  regression  analysis  of  Range  as  dependent  variable  on 
the  other  six  variables  as  predictors  (or  "independent"  verlablea).  As  usual, 
we  shall  begin  by  conaidering  a  linear  combination  of  the  predictor  verlablea, 
and  then  later  consider  the  possibility  of  a  nonlinear  function. 

The  traditional  flrat  step  in  such  a  regression  analysis  is  to  calculate 
the  matrix  of  sums  of  squares  and  products  of  daviatlons  of  the  seven  given 
variables  from  their  means,  and  then  parhapa  note  various  correlation  coefficients. 
What  is  considerably  more  informative  than  the  correlation  coefficient  between 
two  variables,  and  just  as  easily  obtained  from  the  computer,  la  a  acattar  plot 
of  the  two  variables  against  each  other.  Before  aver  any  regression  is 
calculated,  a  good  deal  of  insight  can  be  obtained  by  looking  at  a  few  such 
scatter  plots.  Here  we  should  expect  Muzsla  Velocity  to  have  a  substantial 
predictive  effect  on  Range,  as  it  verified  by  plotting  these  agalnat  each  other. 

So  plots  of  the  other  predictor  verlablea  agalnat  Mu** Is  Velocity  are  of 
interest.  One  such  plot  is  shown  in  Figure  1*,  whara  Cross  Wind  la  the  other 
variable.  We  shall  see  that  Mussla  Velocity  and  Cross  Wind  turn  out  to  be 
the  only  two  effective  predictors,  and  in  retrospect  thia  diagram  is  the  moat 
revealing. 

The  diagram  shows  more  of  tha  relation  batwtan  Cross  Wind  and  Mustla 
Velocity  than  is  conveyed  by  tha  simple  correlation  coefficient  (which  happana 
to  be  about  0.27).  As  that  correlation  coefficient  indicates,  the  two  variables 
are  only  slightly  related,  ao  far  as  the  calculation  of  linear  ragrasalon  is 
concerned.  But  if  wa  should  wiah  to  calculate  a  nonlinear  prediction  surface 
with  these  two  variables,  it  bscomas  relevant  to  notice  that  vharaaa  tha 
abscissae  (M.V.)  are  diatributod  rather  uniformly  over  an  interval,  tha 
ordinate  (C.W.)  are  clustered  in  two  bands  with  a  alsable  gap  between.  We 
shall  be  able  to  estimate  a  quadratic  response  to  M.V.,  and  also  a  cross- 
product  response  (interaction  between  both  verlablea),  relatively  wall,  but 


*The  plotting  coda  in  the  figures  is  as  follows »  one  observation  la  repre¬ 
sented  by  a  email  circle,  two  coincident  observations  by  a  plus  sign,  three 
or  more  coincident  observations  by  a  star.  Tha  axes  are  shown  by  crosses; 
zero  is  marked  if  it  occurs. 

No  machine  works  perfsctly  ell  the  time.  When  I  ran  off  thaaa  figures 
tha  terminal  showed  an  occasional  wobble  in  tha  left  margin.  The  fault  seemed 
too  trivial  to  warrant  repetition  on  another  tsrmlnel. 
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TABLE  1. 

175  m.  cm 

FIRINGS 

;  i 

i 

SAME. 

HEIGHT 

TEMZ+ 

m  q* 

.  ! 
|  , 

2013R 

147.40 

3001 

9.1 

0.9985 

2.3 

~5.0  i 

20097 

147.80 

299  5 

9.1 

0.9983 

2.2 

_5.0  i 

20096 

147,50 

3002 

9.2 

0.Q981 

2.1 

-5.1 

20031 

147.40 

2997 

9.3 

0.9976 

1.9 

-5.1  1 

20079 

147.00 

3003 

9.3 

0,9974 

1.8 

_5, 1 

20001 

147.50 

3002 

9.4 

0.9972 

l.B 

_5, 1 

19953 

147.80 

2993 

9.4 

0.9971 

1.7 

_5,2 

20079 

147.20 

2999 

9.4 

0.9969 

1.6 

-5.2  I 

20391 

147.20 

3031 

9.5 

0.9968 

1.6 

-5.2  1 

20318 

147.25 

3021 

9.5 

0.9965 

1.5 

-5.2  ! 

20272 

147.10 

3020 

9.5 

0,9963 

1.4 

-5,3 

20319 

147,00 

3031 

9.6 

0.9961 

1.3 

-5.3  j 

20037 
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FIGURE  1.  CROSS  WIND  AGAINST  MUZZLE  VEWCITT 
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a  quadratic  response  to  C.W.  less  well  than  if  the  points  in  the  diagram  had 
been  more  uniformly  distributed  between  the  same  extremes. 

The  six  predictor  variables  are  apparently  uncontrolled.  There  is  no 
indication  of  any  deliberate  variation  in  the  Projectile  Weight  or  Muzzle 
Velocity.  These  could  have  been  intentionally  varied,  but  something  approaching 
an  orthogonal  pattern  of  joint  variation  would  presumably  have  been  adopted. 

The  weather  characteristics  were  apparently  not  deliberately  varied  either, 
since  the  rounds  were  fired  in  two  short  series  one  afternoon  and  the  following 
morning.  We  shall  not  therefore  be  surprised  to  find  that  some  of  the  variables 
have  no  detectable  relation  to  the  "dependent"  variable  Range,  even  though  we 
may  believe  that  with  wider  variability  and  more  numerous  observations  each 
variable  would  be  seen  to  have  an  effect. 


In  auch  a  situation  a  step-by-step  procedure  of  introducing  one  variable 
at  a  time  into  the  regression  relation  suggests  itself.  A  simple  computational 
routine,  easily  programmed,  goes  like  this.  Each  time  a  new  predictor  variable 
la  introduced,  not  only  the  dependent  (Range)  vector  but  all  the  other  so-far- 
unuscd  predictor  variables  are  replaced  by  their  projections  at  right-angles 
to  the  designated  predictor  vector.  All  these  vectors  become  vectors  of 
residuals.  By  the  end  of  this  process,  if  all  the  predictor  variables  are  used, 
tha  matrix  of  their  values  will  have  besn  completely  orthogonalized  —  but  we 
shall  not  necessarily  go  this  far.  Each  variable  has  been  read  to  only  limited 
precision  (Projectile  Weight  generally  to  0.1  lb.  apparently,  Muzzle  Velocity 
to  1  f.p.a.,  Temperature  to  0.1  dag.  C,  etc.)  If  at  any  stage  the  corresponding 
vector  of  residuals  shows  little  more  variability  than  this  round-off  error,  that 
variable  ahould  ba  dropped  from  further  consideration.  Usually  wa  shall  wish 
not  to  introduce  any  variable  into  the  regression  relation  unices  lta  presence 
causae  a  perceptible  lowering  of  tha  residual  mean  square.  (The  objectives 
of  stepwiss  regression  and  possible  methods  of  procedure  have  been  discussed 
In  the  literature  —  for  references  see  [1].) 

The  single  variable  that  shows  moat  rslatlon  to  Range  is  Muzzle  Velocity, 
and  after  regression  on  that  has  baen  parformed  the  next  moat  related  variabla 
is  Cross  Wind.  The  effecta  may  be  summarized  in  the  following  analysis  of 
variance  of  Range: 


TABLE  2.  Analysis  of  Variance  of  Range 


Sum  of  squares  D.f. 

Muzzle  Velocity  239416  1 

Croas  Wind  (after  M.V.)  86169  1 

Residual  143946  32 

—  . .  — 

Total  about  mean  469332  34 


Mean  square 
239416 
86169 
4498 
13830 


The  corresponding  formula  for  predicting  Range  is 

9.227  (M.V.)  -  19.5  (C.W.)  -  7687.  (1) 
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That  is  about  all  that  seems  to  be  worth  doing  in  the  way  of  simple  linear 
regression  on  the  available  predictor  variables.  The  further  reduction  in  the 
residual  sum  of  squares  due  to  introducing  any  of  the  other  variables  is  slight. 
(Each  of  the  variables  has  a  substantially  greater  residual  mean  square,  after 
regression  on  M.V.  and  C.W.,  than  would  be  caused  merely  by  the  apparent  round¬ 
off  error  in  the  readings,  and  so  would  be  usable.) 

At  this  stage  it  is  advisable  to  make  scatterplots  of  the  Range  residuals* 
against  (a)  the  fitted  values  for  Range  given  by  the  expression  (1)  above,  and 
(breach  of  the  original  six  predictor  variables  in  turn.  The  plot  against 
Cross  Wind  suggests  a  nonlinear  dependence  of  Range  on  Cross  Wind.  This  plot 
and  also  the  plot  against  Muzzle  Velocity  suggest  that  the  residual  variance 
of  Range  is  perhaps  changing  progressively  with  these  variables. 

Now  if  Range  depends  on  Muzzle  Velocity  and  Cross  Wind,  it. need  not  do 
so  merely  linearly.  In  fact,  theory  suggests  that  C.W.  should  have  a  quadratic 
effect.  Three  more  "Independent"  variables  were  brought  into  consideration, 
the  squares  of  M.V.  end  of  C.W,  and  their  product.  Of  these  new  variables, 
only  one,  the  squaro  of  C.W.,  has  a  mildly  "significant"  affset,  after  the 
linear  regression  on  M.V.  and  C.W.  already  performed.  As  we  saw  from  Figure  1, 
the  peculiar  distribution  of  the  C.W.  values  does  not  permit  us  to  determine 
the  shape  of  the  response  of  Range  to  C.W.  very  well.  Since  theory  predicts  a 
quadratic  effect  we  are  encouraged  to  allow  for  it  and  replace  the  "Residual" 
line  in  Table  2  above  by  the  following  two  lines: 

TABLE  3.  Detail  in  Analysis  of  Variance  of  Range 

Sum  of  squares  D.f.  Mean  square 
C.W.  squared  (after  M.V.  and  C.W.)  16360  1  16360 

Residual  127587  31  4116 

The  corresponding  formula  for  Range  is 

9.224  (M.V.)  -  ft. 4  (C.W.)2  -  52.3  (C.W.)  -  7645  (2) 

The  effect  of  Crose  Wind  is  apparently  to  reduce  Range  by  en  amount  proportional 
to  (C.W.  +  3.1)2,  reduction  is  no_t  proportional  to  the  simple  square  of  C.W. 

Figures  2,3,4  show  scatterplots  of  the  new  Range  residuals  (after 
regression  on  M.V,,  C.W.  end  C.W.  squared)  against  fitted  values,  M.V.,  and 
C.W. ,  respectively,  They  are  reasonably  satisfactory.  We  are  left  with  a 
suggestion  that  the  residual  variance  of  Range  is  not  constant,  or  possibly 
that  the  distribution  of  the  Range  errors  1b  nonnormal  (slightly  leptokurtic) . 


ARather  than  plot  simple  residuals  one  may  plot  what  are  known  as  standardized 
residuals,  in  which  allowance  is  made  for  the  different  weights  arising  from 
the  leaat-squares  fitting.  In  the  present  case,  changing  from  simple  to  stan¬ 
dardized  residuals  makes  no  perceptible  difference  in  the  ecatterplotB . 
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Sometimes  in  regression  studies  it  is  profitable  to  con*H H*r  making 
simple  Liausf ormations  ot  the  variables.  Here,  Range  and  Muzzle  Velocity 
have  such  small  percentage  variabilities  that  no  modest  power  transformation 
of  them,  such  as  squaring  or  taking  logarithms  or  reciprocals,  can  noticeably 
affect  the  behavior  of  their  residuals.  In  the  absence  of  some  suggestion 
from  theory  of  a  more  drastic  transformation,  we  do  not  pursue  the  idea. 

What  have  computer  facilities  done  for  this  regression  study  that  was 
not  available  to  the  desk  calculator  operator?  Any  desk  calculator  man  who 
was  willing  to  contemplate  six  independent  variables  In  regression,  using 
traditional  procedures,  would  no  doubt  have  reached  much  the  same  conclusions. 
What  we  have  gained,  in  addition  to  ease  and  spaed,  is  some  assurance,  based  on 
liberal  inspection  of  scatterplots  (only  a  few  of  which  are  reproduced  here), 
that  our  final  regression  relation  fits  the  data  fairly  well.  That  assurance 
was  not  provided  by  desk-calculator  practices.  When  we  examine  the  goodness 
of  fit  of  a  regression  relation  in  this  way,  we  sometimes  find  clear  evidence 
that  a  different  sort  of  regression  relation  ought  to  be  tried  instead.  Here, 
on  the  contrary,  the  evidence  supports  the  sort  of  regression  relation  we 
began  with.  What  wa  first  think  of  la  not  always  bad! 

4.  UNORTHODOX  REGRESSION.  The  method  of  least  squares  would  be  a 
theoretically  perfect  means  0?  eliciting  information  from  the  observations 
if  wa  could  know  that  the  form  of  the  regression  relation  being  fitted  was 
correct  end  that  the  "error"  part  of  the  dependent  variable,  the  pert  not 
explained  by  the  regreealon  relation,  wee  e  random  variable  Independently 
normally  distributed  with  zero  mean  end  conetent  variance.  When  these  ideal 
conditions  are  not  satisfied,  the  least  squares  results  will  be  to  some  extent 
misleading.  Much  has  been  said  about  least  squares  estimates'  having  minimum 
variance  among  unbiased  linear  estimates,  Independently  of  a  normality  assump¬ 
tion,  but  there  la  no  longer  today  any  good  reason  for  restricting  attention 
to  linear  estimates.  If  some  method  of  analysis  were  known  to  be  better,  we 
should  be  prepared  to  use  it. 

It  la  widely  believed  that  if  the  ideal  conditions  ere  not  grossly 
violated  the  least  squares  method  is  adequate.  One  way  to  check  whether  this 
is  so  is  to  perform  an  optimal  analysis  under  weaker  conditions,  to  see  whether 
perceptibly  different  results  are  obtained.  Various  kinds  of  weaker  analysis 
have  been  suggested.  In  [1]  I  have  proposed  a  particular  way  of  weakening  the 
normality  assumption.  Instead  of  assuming  that  the  error  part  of  the  dependent 
variable  is  normally  distributed  with  constant  variance,  we  assume  that  the 
errors  era  independently  distributed  in  a  common  distribution  belonging  to  a 
family  having  one  shape  parameter,  say  3.  Whan  o  ■  0  the  distribution  is  normal. 
When  0  >  0  the  distribution  is  what  Karl  Pearson  called  Type  VII,  with  longer 
tails  than  tha  normal,  having  the  same  shape  es  e  Student  distribution.  If  a 
is  e  scale  parameter,  we  assume  that  the  errors  c  have  a  density  function  of 
the  form 

Ac"1  (1  +  CCc/c)2)-1/01, 

where  A  end  C  depend  on  a.  If  a  <  0  ths  distribution  is  whet  Pearson  called 
Type  II,  having  shorter  tells  than  th^normal  distribution  and  a  finite  range, 
(For  further  details  see  [1],  where  a"1  is  denoted  by  m.) 
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FIGURE  4.  RANGE  RESIDUALS  AGAINST  CROSS  WIND 
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The  suggested  method  of  fitting  a  regression  relation  under  this  weaker 
SSSUSptiCS  zbc’J-t  the  distribution  of  tb°  orrnrq  ■{  a  fn  ■InvpaMtfAfp  thfi  likeli- 
hood  function,  which  involves  the  regression  coefficients  and  also  the  two 
nuisance  parameters,  a  and  a.  It  is  suggested  that  the  likelihood  function 
should  be  integrated  with  respect  to  suitable  prior  distributions  for  a  and 
o,  yielding  a  marginal  likelihood  function  of  just  the  regression  coefficients; 
and  the  latter  should  if  possible  be  approximated  by  a  multi-variate  Student 
density. 

This  procedure  has  been  carried  out  for  the  above  gun  firings,  with  the 
following  particulars.  A  simple  linear  regression  on  Muzzle  Velocity  and 
Cross  Wind  was  considered,  without  a  term  in  C.W.  squared.  The  nuisance 
parameters  o  and  a  were  taken  to  have  independent  prior  distributions,  uniform 
over  the  whole  real  line  for  Iner,  uniform  over  the  interval  (-0.25,  0.75)  for 
a.  That  interval  for  a  was  chosen  as  including  the  more  plausible  values  for 
a  —  the  maximum  likelihood  estimate  of  a  turns  out  to  be  about  0.12  —  and 
should  be  broad  enough  to  bring  out  the  qualitative  features  of  this  type  of 
analysis.  (We  should  be  back  at  the  method  of  least  squares  if  a  were  restricted 
to  the  single  value  0.)  Orthogonal  Independent  variables  were  used  as  follows: 


Xx  -  (M.V.)  -  3009.6, 

X2  -  (C.W.)  -  0.068666  (M.V.)  +  209.55. 

Our  task  is  to  fit  the  linear  relation 

E  (Range)  «  8Q  +  BjXj.  +  02X2  . 

Our  previous  least  squares  analysis  gave  the  estimates  (equivalent  to 
relation  (1)  above 

0q  -  20139.1,  6L  -  7.89,  02  -  -19.48.  (3) 

The  estimated  variance  matrix  of  these  three  quantities  was  diagonal,  with 
diagonal  elements 


129  1.17  19.81 

based  on  the  estimated  residual  variance  having  32  degrees  of  freedom. 

AIn  pur  new  analysis  we  find  that  the  marginal  likelihood  function  of 
Sq,  8^,  02  has  its  maximum  at 

60  -  20138.9,  -  7.94,  §2  -  -19.05.  (4) 

The  whole  function  is  fairly  well  approximated  by  a  multivariate  Student 
density  with  33  degrees  of  freedom  and  the  following  estimated  variance 
matrix  , 

128  0.6 

0.6  1.14 

\  6.0  0.28 
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6.0  \ 
0.28 
21.77  / 


Comparing  the  new  estimates  (4)  with  the  previous  estimates  (3).  we  nee 

that  §„  has  changed  by  lees  than  2*  uf  ics  estimated  standard  error,  3.  by  less 
u  ^  * 
than  5%,  @2  by  less  than  10%.  The  changes  in  the  estimated  variance  matrix 

and  number  of  degrees  of  freedom  are  trivial.  For  most  practical  purposes,  our 
new  analysis  has  given  results  indistinguishable  from  the  least  squares  analysis. 

Now  if  the  assumption  of  normally  distributed  errors  with  constant  variance, 
underlying  the  method  of  least  squares,  la  false,  our  weaker  assumption  of  a 
Type  VII  -  Type  II  system  with  a  in  the  range  (-0.25,  0.75)  may  also  be  false. 

In  particular,  the  distribution  of  errors  could  be  skew.  But  the  Type  VII  - 
Type  II  family  of  error  distributions  is  far  broader  than  the  normal  family. 

If  an  assumption  about  distribution  shape  has  an  important  influence  on 
conclusions,  we  might  hope  to  detect  this  fact  through  what  we  have  done. 

The  close  agreement  of  the  results  of  our  two  types  of  analysis  Btrongly 
suggests  that  the  least  squares  analysis  of  this  particular  body  of  data  was 
not  much  colored  by  the  implied  distribution  assumption.  Whether  the  same 
comforting  conclusion  would  usually  be  reached  in  studies  of  other  bodies  of 
data  1  do  not  know. 
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SOME  COMMENTS  ON  MATCHING* 


K.A,  Brownlee 
University  of  Chicago 
Chicago,  Illinois 


My  topic  today  is  "matching"  in  situations  where  the  response  is  of 
the  (0,1)  type,  firstly  in  an  experimental  situation  and  secondly  in  an 
observational  situation.  In  both  cases  I  wish  to  advance  the  suggestion 
that,  frequently,  to  use  a  clichfi,  the  game  is  not  worth  the  candle  (what¬ 
ever  that  means) . 

In  purely  experimental  work,  in  which  the  response  is  of  the  (0,1) 
type,  one  may  be  tempted  to  use  matching.  I  recall  an  experiment  on  weather 
modification  (an  activity  to  which  I  tend  to  refer,  In  general,  as  rain 
faking)  by  Braham,  Batten  and  Byers  (1],  with  the  cooperation  of  the  U.S. 

Air  Force.  A  plane  sought  out  single  clouds  in  the  Caribbean.  A  cloud 
that  looked  as  if  it  met  certain  specifications  would  be  inspected,  and  if 
it  did  then  a  randomized  choice  would  be  made  as  to  whether  it  was  to  be 
seeded.  Following  the  result  of  the  randomization,  the  plane  would  fly 
through  the  cloud  and  either  release  the  seeding  agent  or  not,  and  then  the 
cloud  would  be  observed  for  an  appropriate  period  to  see  if  it  developed 
radar  echoes. 

After  the  completion  of  this  period  of  observation,  the  plane  would 
then  seek  another  cloud  which  met  the  specifications.  This  cloud  would 
receive  the  opposite  treatment  to  that  handed  out  to  the  first  cloud.  The 
two  clouds  then  formed  a  matched  pair,  with  responses  as  tabulated  below. 

Unseeded 
+ 

nll  n12 
n21  n22 

If  time  permitted  on  that  day,  the  plane  would  make  a  second  mission,  but 
if  time  or  gasoline  ran  out  before  the  second  member  of  the  second  pair  had 
been  found,  then  the  first  member  of  the  second  pair  had  to  be  abandoned. 

Of  course,  it  could  also  happen  that  on  the  plane's  first  flight  it  found 
one  cloud  but  failed  to  find  another  before  running  out  of  gasoline. 

The  idea  of  using  matched  pairs,  of  course,  was  an  intuitive  one  based 
on  ideas  analogous  to  those  relevant  to  the  concept  of  randomized  blocks. 
Just  as  the  variation  between  plots  close  together  in  the  same  block  is 


*  This  research  wsb  sponsored  by  the  Army  Research  Office,  Office  of  Naval 
Research,  and  Air  Force  Office  of  Scientific  Research  by  Contract  No. 
Nonr-2121(23),  NR  342-043. 
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'’ftnuldered  likely  to  be  less  than  that  between  plots  in  widely  separated 
blocks,  so  it  was  supposed  tnat  cluuua  uu  the  cene  d°y  were  mote  probable 
to  resemble  each  other  than  clouds  on  different  days. 

Some  work  by  Jane  Worcester  [2]  is  relevant  to  this.  For  example, 
supposing  that  the  variation  from  day  to  day  is  represented  by  equal  numbers 
of  days  with  probabilities  0.4  and  0.6,  and  the  effect  of  the  treatment  is  to 
increase  those  probabilities  by  0.1  to  0.5  and  0.7,  and,  that  a  level  of 
significance  a  «  0.05  is  used,  then  for  a  power  of  0.90  the  sample  sizes 
neceosary  for  paired  and  unpaired  experiments  are  reported  by  Worcester  to 
be  811  and  845  respectively.  The  use  of  pairing  thus  decreases  the  necessary 
sample  size  by  4.0  per  cent,  a  rather  inconsequential  amount,  particularly 
in  the  context  of  the  experiment  I  have  referred  to,  where  the  use  of  pairs 
reduced  the  number  of  observations  quite  appreciably. 

If  the  heterogeneity  was  more  extreme,  sAy  equal  numbers  at  probabilities 
0.3  and  0.7,  then  the  corresponding  sample  sizes  would  be  845  and  709,  a 
reduction  of  16.1  per  cent. 

Of  course,  in  reality  the  distribution  of  the  probability  from  day  to 
day  would  not  be  a  discrete  distribution  concentrated  in  equal  proportions 
at  two  points,  but  instead  presumably  a  unlmodal  continuous  distribution, 
with  which  the  effect  of  heterogeneity  would  probably  be  quite  modest. 

The  paired  experiment  had  a  further  weakness,  namely  Its  Integrity 
was  compromised  if  the  observer  who  selected  the  clouds  was  aware  of  which 
treatment  was  applied  to  the  first  cloud.  He  would  then  know  ahead  of  time 
which  treatment  would  be  applied  to  the  second  cloud,  and  could  select  the 
second  cloud  in  accordance  with  his  predelictions .  The  scientists  running 
the  experiment  maintained  that  the  man  selecting  the  clouds,  in  the  front 
of  the  plane,  was  unable  to  tell  whether  the  seeding  agent  was  released  or 
not,  but  nevertheless  one  wonders  whether  he  could  not  tell,  perhaps  sub- 
conciously,  either  from  the  behavior  of  the  plane  (for  If  the  seeding  agent 
waa  released  the  plane  was  appreciably  iighter) ,  or  from  the  behavior  of 
the  other  members  of  the  crew. 

The  famous  calculating  horses  were  able,  apparently,  to  respond  to 
imperceptible  gestures  on  the  part  of  their  human  accomplice,  and  it  is 
conceivable  that  the  human  cloud  selector  was  as  sensitive  as  these  horses. 

In  general,  if  matching  is  employed  but  actually  is  ineffective,  then 
the  power  of  the  experiment  Is  asymptotically  unchanged,  but  for  small 
samples  the  matching  procedure  seems  less  efficient.  For  example,  for  sample 
sizes  of  10  in  the  two  Independent  samples,  the  table 

+ 

A  0  10  10 

B  5  5  10 

gives  a  two  tailed  P  value  of  0.0326  of  Fisher's  exact  test,  but  if  the  data 
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was  to  be  analyzed  as  10  pairs 


it  would  have  to  be 


A 


+ 

- 

+ 

0 

5 

5 

- 

0 

5 

5 

0 

10 

10 

for  which  the  two  tailed  P  value  is  0.0b25. 

This  question  of  power  when  matching  is  ineffective  is  explored  by  L.H. 
Youkeles  [3].  His  results  show  that  this  loss  of  power  ceases  to  be  appreciable 
after  the  two  sample  sizes  have  reached  30. 

I  think  that  it  is  clear  that  the  motivation  to  use  matching  is  provided 
by  its  analogy  with  the  idea  of  randomized  blocks.  The  prestige  of  this 
procedure  is  so  great  that  1  rather  uncritically  assumed  that  matching  would 
be .better  without  thinking  through  what  might  happen.  The  general  robustness 
of  the  unmatched  completely  randomized  procedure  now  seems  to  me  to  be  preferable 
to  the  hypothetical  greater  power  of  the  matched  design.  It  seems  to  me  to  be 
a  common  failing  of  the  consulting  statistician  to  automatically  recommend  the 
most  complicated  experimental  design  he  can  put  over  on  his  client  without 
considering  whether  it  is  in  reality  justified. 

Turning  to  an  observational  situation,  I  have  observed  that  in  medical 
and  sociological  investigations  one  or  another  form  of  "matching"  is  quite 
frequently  used.  One  form  of  matching  is  the  formation  of  so-called  "matched 
pairs."  One  such  study,  which  received  a  great  deal  of  popular  Interest,  is 
part  of  a  paper  by  E.  Cuyler  Hammond  [4]. 

Part  of  this  paper  contained  a  matched  pair  analysis  and  the  procedure 
is  described  in  the  following  quotations: 

"...we  decided  to  investigate  the  matter  by  studying  the  death 
rates  of  cigarette  smokers  and  nonamokers  who  were  alike  in  many 
characteristics  other  than  their  smoking  habits.  This  was  accom¬ 
plished  by  a  matched  pair  analysis  carried  out  sb  follows: 

Two  groups  of  subjects  were  identified:  1)  men  who  never  smoked 
regularly  and  2)  men  currently  smoking  20  or  more  cigarettes  a  day 
at  enrollment  in  the  study  regardless  of  whether  they  also  smoked 
cigars  or  pipes.  These  two  groups  were  then  divided  into  5-year 
age  groups.  Within  each  age  group,  we  matched  men  by  pairs,  each 
pair  consisting  of  a  nonsmoker  and  a  cigarette  smoker.  The  two  men 
in  a  pair  had  to  be  alike  in  all  the  following  characteristics:  race 
(white,  Negro,  Mexican,  Indian,  or  Oriental);  height;  nativity  (native- 
born  or  foreign-born);  residence  (rural  or  urban);  urban  occupational 
exposure  to  dusts,  fumes,  vapors,  chemical,  radioactivity,  etc.  (yes 
or  no);  religion  (Protestant,  Catholic,  Jewish,  or  none);  education; 
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marital  status  (single,  married,  widowed,  divorced,  or  separated); 
drinking  of  alcoholic  beverages;  sleep  (under  6  hours,  6-9  hours, 
or  10  or  more  hours  per  night);  usual  amount  of  exercise  (none  or 
some) ;  severe  nervous  tension  (yea  or  no) ;  history  of  cancer  other 
than  skin  cancer  (yes  or  no);  and  history  of  heart  disease,  stroke, 
or  high  blood  pressure  (yes  or  no) .... 

The  matching  procedure  was  carried  out  with  an  IBM  1410  computer. 

The  records  for  the  nonafaokers  were  put  on  one  magnetic  tape  and  the 
records  for  the  cigarette  smokers  on  another.  The  records  in  both 
tapes  were  then  sorted  in  order  by  the  codes  for  all  the  variables 
under  consideration  as  described,  Thus  on  both  tapes  the  records 
were  arranged  in  blocks,  a  block  being  defined  as  a  group  of  records 
identically  coded  in  all  the  variables  under  consideration.  By  use 
of  random  numbers,  the  records  within  each  block  were  arranged  in 
random  order.  The  2  tapes  were  then  compared  block  by  block.  Blocks 
found  on  only  one  taps  (i.e. .  the  same  number  of  cigarette  smokers 
as  nonsmoker a)  were  accepted  as  matching  pairs.  For  example,  if 
a  block  of  2  cigarette  smokers  matchsd  a  block  of  2  nonsmokera,  then 
2  matched  pairs  were  identified,  the  first  cigarette  smoker  and  the 
first  nonsmoker  being  the  first  pair  and  tha  second  cigarette  smoker 
and  the  second  nonsmoker  being  the  second  pair.  If  the  matched 
blocka  were  of  unequal  length,  then  the  excess  records  in  the  longer 
bloch  were  discarded.  For  example,  if  a  block  of  5  cigarette  smokers 
matched  a  block  of  only  2,  than  tha  first  2  smokers  formed  matched 
pairs  with  the  2  nonsmokera,  and  the  last  3  smokers  were  discarded. 

Thus  the  excess  (discarded)  records  wars  selected  at  random  since, 
within  each  block,  the  records  were  arranged  In  random  order.... 

With  so  many  characteristics  to  be  considered,  many  men  could 
not  be  matched.  However,  the  computer  found  36,975  matched  pairs  of 
men  (36,975  nonsmokera  and  36,975  cigarette  smokers),  such  that  the 
2  men  in  each  pair  were  alike  in  all  the  specifications  outlined. 

...Of  the  36,975  nonsmokera,  662  (1.8%)  died  and,  of  the  36,975 
cigarette  smokers,  1,385  (3.7%)  died  between  the  start  of  the  study 
and  September  30,  1962.  This  difference  is  statistically  significant 
(P  <  O.GOOOOl)." 

The  matching  employed  by  Hammond  is  very  complex:  apart  from  smoking, 
he  employed  15  categorizations,  some  at  two  levels  only  and  others  at  several 
levels.  The  number  of  cells  in  this  17  dimensional  lattice  was  2  x  8  x  5  x  2 
x2x?x4x5x5x5x3x2x2x2x2x2x2*  210  x3x4x5^x8 
-  61  440  000.  However,  this  overestimates  the  number  of  cells,  as  though 
education  was  recorded  in  5  categories,  "The  two  men  in  a  pair  had  to  be  in 
the  same  education  category  as  in  adjacent  categories." 

Let  us  consider  the  esse  where  matching  is  performed  on  only  one  categoriza¬ 
tion  at  2  levels.  Ths  population  Is  thus  cross-classified  into  a  2  x  2  table: 
for  convenience  let  us  continue  to  use  smoking-nonsmoking  as  one  of  the 
categorizations,  represented  by  tha  symbols  S  end  T,  and  baldness-nonbaldnesa 


IB 


aa  the  other  categorization,  represented  by  B  and  C.  Let  the  proportion  of 
the  population  falling  into  the  four  classes  be  eSB,  etc. 

Suppose  that  we  have  a  very  large  sample,  so  that  sampling  fluctuations 
can  be  ignored,  and  let  the  death  rates  for  each  class  be  etc.  Then  if 

the  sample  size  is  N,  then  the  number  of  bald  smokers  is  N0gg,  etc. 

Table  1 


Bald 

Nonbald 

Proportion 

Proportion 
of  Deaths 

Proportion 

Proportion 
of  Deaths 

Smokers  0gB 

^B^SB 

°SC 

9SC*SC 

Nonsmokers 

0  * 

TBfTB 

®TC 

6tc^tc 

Matching  will  consist  in  matching  each  bald  smoker  with  a  bald  nonsmoker 
(and  analogously  for  the  nonbald  columns) .  Since  in  general  6gB  j*  there 

will  be  an  excess  of  smokers  or  sn  excess  of  nonsmokers  in  this  category,  and 
the  excess  will  be  discarded  by  random  selection.  ThuB  if  0gB  >  0TB  and 

0  >  ®TC’  non,mo^er8  *r*  undisturbed,  but  the  number  of  smokers  will 

be  reduced  to  N6„_  and  N8_„  and  the  numbers  of  deaths  amongst  smoksrs  to 
Td  tu 

<N6tb  *sb  +  N9tc  *SC>- 

The  death  rate  for  atnokere  In  the  hatched  sample,  eey  DgM,  will  tbuB  be 

Win  ♦,Vac.  (1, 

"sm  eI8  +  »TC 

The  death  rate  for  nonsmokers  in  the  matched  sample,  say  D^,  will  be 


TM 


9TB*SB  *  9TC*TC 
9TB  +  ®TG 


(2) 


and  this  is,  of  course,  the  same  as  the  death  rate  for  nonsmokers  in  the 
unmatched  sample,  aay  DTy.  1 


DSU’ 


For  the  amokers,  however,  the  death  rate  in  the  unmatched  sample,  aay 
is 


D 


SU 


9sb*sb 


SB 


+  e 

T"e 


sc^sc 

sc 


(3) 


Now  suppose  that  the  ratio  of  smoker  to  nonsmoker  death  rates  in  the 
unmatched  sample  ie  the  same  aa  in  the  matched  sample ,  i.e. 
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D„  /D_._  -  n  /n 

O  U  iU  3M' 


TM 


Note  that  this  is  a  rather  weak  condition.  We  are  not  requiring  that  the 
matched  samples  give  the  "right'1  answer  for  the  death  rates  in,  e.g.,  the 
smoking  population,  but  merely  that  the  ratio  of  death  rates  of  smokers  to 
nonsmokers  be  the  same  in  the  matched  sample  as  in  the  population. 

The  condition  implied  in  equation  (4)  implies  that 


0  dj  •{•  0  A 

SIT  SB  S(TSC 

9  +  8 

SB  SC 


which  in  turn  requires  that 

(<t> 


SC 


0  <fc  0  A 

TB^SB  TC^SC 
6TB  +  0tc 


W  ^6SC6TB  ~  0SBeTC5  "  °* 


(5) 

(6) 


Thus  the  relationship  between  smokers  and  nonsmokers  is  the  same  in  the  matched 
sample  as  in  the  original  unmatched  sample  if  either 


(a) 

*  sc 

“  ^SB 

(7) 

or 

Cb> 

esc 

0 

-  r* 

9tb 

(8) 

*TC 

or  (c)  both  (a)  and  (b)  are  aimultanaoualy  satiefied. 

Condition  (a)  is  that  the  death  rate  for  bald  smokers  be  the  same  as  the 
death  rata  for  nonbald  smokers;  in  othar  words,  for  smokers  baldness  or  non¬ 
baldness  does  not  affect  the  death  rata. 

Condition  (b)  la  one  form  of  the  familiar  independence  criterion  for  two 
cross  categorizations ,  for  example  that  the  probability  of  baldness  is  the 
same  for  smokers  as  for  nonsmokers. 


Condition  (a)  is  asymmetrical  in  that  it  refers  only  to  smokers.  This 
asymmetry  occurs,  of  course,  because  the  operation  of  matching,  in  the  situation 
assessed  here,  alters  the  relative  numbers  of  bald  and  nonbald  men  in  the 
smoking  group  ouly,  since  the  nonsmoking  group  is  left  unchanged  by  the  matching 
operation. 


If  the  Independence  between  smoking  and  baldness  implied  by  (8)  is  not 
satisfied,  in  the  particular  manner  implied  by  the  condition 


SC 


TC 


then  the  smoking  class  determines 


1  < 
the 


TB 


(9) 


alas  of  the  matched  group  for  nonbald 
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people  and  the  nonsmoking  class  determines  the  size  of  the  matched  group  for 
bald  people.  It  is  straightforward,  but  somewhat  tedious,  to  show  that  (4) 
iu  gcueial  requires  cnat 


TB 

^TC 

(10) 

SB  " 

^SC  * 

(ID 

In  other  words  the  death  rate  for  bald  smokers  must  be  the  same  as  the  death 
rate  for  nonbald  smokers  and  also  the  death  rate  for  bald  nonsmokers  must  be 
the  same  as  for  nonbald  nonsmokers. 

These  two  results,  (7)  and  (8),  and  (10)  and  (11),  both  correspond  to 
commonsense.  All  death  rates  are  in  effect  weighted  averages.  The  unmatched 
death  rates  are  weighted  averages  using  as  weights  the  properties  of  each 
category  in  the  population.  Equation  (7)  follows  from  the  fact  that  if  the 
death  rates  in  the  two  categories  are  equal  it  makes  no  difference  what  weights 
are  used.  Equation  (8)  follows  from  the  fact  that  if  the  proportion  bald/non¬ 
bald  is  the  same  for  the  smokers  as  for  the  nonsmokers,  then  the  matched  sample 
will  have  the  same  weights  as  the  unmatched.  Equations  (10)  and  (11)  are 
similar  to  (8) . 

I  should  like  to  illustrate  this  with  a  small  synthetic  numerical  example. 
Imagine  a  population  being  matched,  smokers  against  nonsmokers,  according  to 
some  factor  such  as  baldness. 

Suppose  that  the  smokers  number  110,000  of  whom  100,000  are  not  bald 
with  a  death  rate  of  1  per  cent  and  10,000  are  bald  with  a  death  rate  of  5 
per  cent.  Then  the  overall  death  rate  for  smokers  is  the  ratio  of  the  total 
number  of  deaths, 

100,000  x  0.01  +  10,000  x  0.05  -  1000  +  500  -  1500 
to  the  total  number  of  smokers, 

100,000  +  10,000  ««  110,000. 

Thus  the  death  rate  for  smokers  is 

1100/110,000  -  1.36%. 

Now  suppose  that  the  nonsmokers  number  35,000,  of  whom  20,000  are  not 
bald  with, a  death  rate  of  1  per  cent  and  15,000  are  bald  with  a  death  rate 
of  2  per  cent.  Then  the  death  rate  for  nonsmokers  is 

20,000  x  0.01  +  15,000  x  0.02  500  .  . 

20,000  +  15,000  ”  35,000'  " 

This  Is  one  population  the  smokers  have  a  slightly  lower  death  rate  than  the 
nonsmokers;  the  ratio  of  death  rates  is  1,36/1.43  -  0.95. 

Now  suppose  that  the  smokers  and  nonsmokers  are  "mecched."  For  the  nonbald 
the  device  that  does  the  matching  will  be  able  to  select  20,000  smokers  out  of 
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the  100,000  available,  and  these  will  have  an  expected  number  20,000  x  0.01 

_  o  r\r\  ,i  ^ » 4.  l  ifi.  ,  j  j  «  .  4.1.  .  i.  j  .  .  4.1  .it  »ii  i  •  •  * 
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sample  the  10,000  bald  smokers  who  will  have  an  expected  number  10,000  x  0.05 
-  500  deaths .  Thus  for  smokers  in  the  matched  sample  the  death  rate  is 

(200  +  500) / (20, 000  +  10,000)  “  700/30,000  =  2.33%. 

For  nonbald  nonsmokers  the  20,000  in  the  population  stay  in  the  matched  sample, 
producing  an  expected  number  20,000  x  0.01  ■  200  deaths,  but  the  bald  non- 
smokers  are  reduced  in  number  to  10,000,  for  which  the  expected  number  of  deaths 
is  10,000  x  0.02  ■  200.  Thus  for  nonsmokers  in  the  matched  sample  the  death 
rate  Is 

(200  +  2O0)/(2O,OOO  +  10,000)  -  400/30,000  -  1.33%. 

Thus  in  our  matched  sample  the  smokers  have  a  somewhat  higher  death  rate  than 
the  nonsmokers,  the  ratio  of  death  rates  being  2.33/1.33  -  1.75.  The  direction 
of  this  relationship  between  the  death  rates  for  smokers  and  nonsmokers  in  a 
matched  sample  is  the  reverse  of  what  occurred  in  the  population. 

I  think  the  reason  that  matching  proves  misleading  in  this  observational 
situation  is  that  it  is  a  close  relative  of  covariance  analysis.  We  know  that 
in  a  purely  experimental  situation  it  is  essential  that  the  concommitant 
variable  be  independent  of  the  experimental  treatments,  and  the  same  must  hold 
good  in  an  observational  situation.  If  the  concommitant  variable  is  not 
independent  of  the  treatments,  then  hideous  fallacies  may  arise  when  we  "adjust" 
the  response  means.  1  chink  the  analogous  situation  may  arise  in  a  discrete 
matching  situation:  the  frequencies  of  the  matching  criterion  may  be  forced 
into  quite  unrealistic  distributions. 

From  another  point  of  view,  the  matching  procedure  is  forming  a  weighted 
average  for  which  the  weights  are  quite  unrealistic. 

In  other  words,  I  believe  that  the  matching  procedure  adds  seriously  to 
the  difficulties  of  extracting  rigorous  inference  from  observational  data  and 
we  should  be  quite  hesitant  about  employing  it. 
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ON  METHODS  OF  OPTIMIZATION  OF  A  MULTIOBJECTIVE  SURVEY 


jotin  C.  Atkinson 
Harvard  Computing  Center 
Boston,  Massachusetts 

ABSTRACT.  The  Wound  Data  Project  is  a  survey  of  wounded  personnel  in 
which  information  is  collected  about  the  projectile  or  thermal  agent  causing 
the  wound,  and  the  incurred  physiological  and  phychological  effects,  together 
with  the  hospital  information.  The  experimental  design  is  not  under  direct 
control,  since  only  those  cases  that  do  occur  can  be  observed.  The  control 
that  can  be  exerted  in  such  work  consists  of  proper  questionnaire  design  and 
an  attempt  to  continue  observation  until  certain  minimum  numbers  of  cases 
have  occurred  in  designated  categories.  In  this  project,  the  number  of 
categories  is  in  the  hundreds,  waking  it  highly  unlikely  that  the  desired 
number  of  cases  will  be  observed  in  all  categories.  There  is  no  ability  to 
control  the  level  at  which  factors  occur  (e.g.,  projectile  striking  velocity). 

The  appropriate  statistics  to  be  used  are  largely  those  developed  in  fields 
specializing  in  survey  work  such  as  epidemiology  and  social  relations.  In 
fact,  the  computer  system  to  be  used  for  file  manipulation  in  this  project, 
DATA-TEXT,  is  one  developed  by  the  Harvard  School  of  Social  Relations. 

The  areas  to  be  investigated  in  this  study  are  dictated  by  Amy  require¬ 
ments,  and  Information  is  now  being  recorded  by  the  field  team.  The  specific 
questionnaires  from  which  Hollerith  cards  will  be  punched  are  to  be  filled  out 
by  the  CONUS  Team  from  this  data.  Adequate  medical  personnel  are  available 
in  the  CONUS  Team  to  insure  proper  medical  Interpretation  of  questions.  Areas 
in  which  advice  is  sought  from  the  "clinical"  panel  include  statistical  pitfalls 
in  questionnaire  design,  and  optimum  selection  of  subjects  where  choices  exist. 

If  subjects  are  selected  with  multiple  wounds.  Individual  variation  is  minimized 
and  direct  comparison  allowed  between  or  among  physical  characteristics  such  as 
penetrating  ability.  However,  the  physiological  and  phychological  effects  of 
a  particular  wound  are  unmeasurable  in  the  multiple  wound  case  due  to  the 
confounding.  The  proper  target  sample  size  would  appear  to  be  better  defined 
on  attaining  a  pre-determined  number  of  cases  showing  some  set  of  characteristics, 
rather  than  by  merely  observing  some  total  number  of  cases  without  regard  to 
the  information  content  of  these  cases.  However,  how  to  select  the  proper  set, 
or  sets,  of  characteristics  in  a  survey  where  many  such  combinations  exist, 
each  for  some  different  output  of  the  survey,  is  difficult.  Any  selection  based 
on  frequency  of  observed  characteristics  implies  feedback  from  the  evaluation 
team  (CONUS)  to  the  collection  team  (SEA)  which  are  physically  separated  by  some 
10,000  miles.  The  ability  of  the  collection  team  to  "collect"  also  depends  upon 
the  vagaries  of  war. 
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COMPONENTS  OF  VARIANCE  OF  A  LINEAR  FUNCTION 
IN  REPEATED  TRIALS 

Walter  D.  Foster 

U.S.  Army  Biological  Laboratories 
Fort  Detrick 
Frederick,  Maryland 

ABSTRACT .  The  quality  (Q^)  of  the  1-th  batch  of  a  material  diminishes 
with  time  according  to  a  function  which  is  linear  in  its  parameters,  a 
separate  parameter  set  estimated  for  each  batch,  The  quality  of  each  batch 
is  extrapolated  to  a  common  future  date,  tf,  by  means  of  its  time  furctlon. 

A  weighted  mean  quality  is  computed,  using  the  known  amount  of  each  batch 
as  the  weight: 

%  -  a±/  2»1 

The  problem  is  to  find  the  variance  of  the  weighted  mean,  VfC^),  given  the 
estimated  parameters  of  each  time  function  and  the  elapsed  time  to  tf.  In 
case  that  the  time  functions  have  the  form 

-  Ai  +  Bi  X  +  Y 

it  is  known  in  a  special  application  that  the  batch-to-batch  distribution 
of  the  A^  is  normal  and  independent  of  B  and  C.  The  bivariate  distribution 
of  B  and  C  has  a  high  covariance,  p  “  .87,  with  markedly  skewed  marginal 
distributions,  each  in  the  positive  direction.  It  has  been  acceptable  to 
write  C  in  terms  of  B  as 

C  -  d  +  e  B 

for  another  application  not  discussed  here. 


The  remainder  of  this  paper  was  reproduced  photographically  from  the  author's 
copy. 
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In  a  manufacturing  process,  the  quality,  Q£,  of  the  i-th  batch  of  material 
is  measured  periodically,  because  quality  is  known  to  deteriorate  with  time. 

The  deterioration  function, 

Ql  -  At  +  -f  CjY , 

is  fitted  to  each  batch,  resulting  in  a  unique  set  of  statistics,  (A,  B,  C) 
for  each  batch.  The  variable  X  is  time;  Y  is  log(X  +  1).  The  times  of  ob¬ 
servation  are  not  necessarily  the  same  for  each  batch,  nor  is  each  batch  manu¬ 
factured  at  the  same  time.  The  quality  of  each  batch  Is  weighted  by  the  amount 
of  each  batch.  It  is  the  weighted  average  of  quality,  and  its  variance 
which  are  required  for  a  fixed  time,  t£.  The  following  schematic  is  illustra¬ 
tive. 

Batch  A 
Batch  B 

Batch  C 

Time  t£ 

Computationally,  it  is  straight  forward  to  compute  Q  at  time  t£  for  each 

batch  and  to  continue  over  batches  to  compute 

■  Sw£Q£/Iw£  and 

v(Q„)  -  -  Qv>Z/sV 

But  this  is  not  the  problem.  It  is  desired  to  find  a  formulation  for  V(Q^) 

involving  the  distribution  of  the  parameters  of  the  deterioration  functions. 

The  distribution  of  the  A£  is  known  to  be  normal  (with  available  estimates  of 

mean  and  variance)  and  independent  of  B  and  C.  The  bivariate  distribution  of 
B  and  C  while  not  known  functionally  has  a  high  covariance,  r  -  .87,  with 
markedly  skewed  marginal  distributions. 

Three  cases  are  given  as  successive  stages  of  a  possible  approach  to 
illustrate  the  form  of  a  desired  solution.  General  notation  applicable  to  all 
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three  cases  includes  the  following: 

Let  -  j-th  observation  of  quality  on  the  i-th  batch,  i  ■  l--i,  j  ■  1-J 

and  Q.  -  quality  of  i-th  batch  averaged  over  the  j  observations 
*>  • 

and  Q,.  ■  average  quality  averaged  over  all  batches 

x  amount  of  i-th  batch,  used  as  a  weight  factor 

t  x  time  of  observation 


CASE  1 

No  deterioration  of  batch  quality  with  time.  All  batches  manufactured  at 
the  same  time  with  the  same  dates  of  surveillance.  Variance  of  assay  the  seme 
for  each  batch.  Pictorially, 


• 

• 

• 

• 

—  Dafeh  ft 

_ _ m _ * _ 

_ 

time 

Let  the  random  model, 

QlJ  -  m  +  Ci  +  •iJ, 

1  ■  1 - i  , 

j  “  1 - J » 

with  zero  covariances  represent  quality  so  that 

E(Q.  )  ■  which  is  estimated  by 

X  • 

Qi.  ■  z 

Q*.w  -  EE  wt  Q1J/J  2wi. 

Note  that  the  previous  computation  for  the  variance  of  the  mean,  namely, 

v<Q. ,w)  -  as4CQlt  -  Q.-w>2/  swt  , 

neither  partitions  the  source  of  variation  nor  uses  the  distribution  of  the  para 
meters  of  the  deterioration  functions. 
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The  following  results,  assuming  LhaL  the  are  constant,  are  obtained  from 

the  expected  mean  squares  which  are  well  known  to  be  given  by  the  analysis  of 
variance  model, 


Source 


df 


E(MS) 


Between  Batches  1-1 
Vithin  Batches  i( J—l J 


2  2 

o  +  ioc 


from  which  we  have 


2  ,  2 
c  +  a 

c 


V<V 

V(Q±  >  -  a2/ i  +  °c  and 
V(Q. . )  -  a2/ij  +  o2/J  . 

When  the  amounts,  w.,  are  known  but  not  equal,  the  weighted  mean  is 


*1  Vs  wi 

m  +  £  w.c. /2w,  +  Tu.  2 


*i“i 


Jejj/JE  w£ 


with  variance 

if  the  covariances  are  ignored.  The  partition  has  a  desirable  form. 


V(Q..W)  -  ( o2  +  cr2/J>  2w2/(2  w^2 


CASE  11 

The  deterioration  rate  of  each  batch  in  time  has  the  same  loss  coefficient 
in  the  model, 


m  +  c^  -bt  +  eij , 


where  as  before  all  batches  have  the  same  date  of  manufacture  and  the  same 
dates  of  surveillance  and  the  same  variance  of  assay.  The  following  figure 
indicates  the  nature  of  the  problem: 
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Then  the  Miaiity  c£  the  1-th  bitch  oc  u  fixed  time,  c^,  is  given  by 

Q. .  ■  m  +  c .  -  bt„ 

1  It 

and  th'!  weighted  average  quality  of  all  batches  at  time,  tf,  is 

Q..w  •  +  c*  •  btf)fcw1 

■  m  -  bt^  +  Sw^c^/Ew.^ 

with  corresponding  variances  given  by 

V(Qi.)  «■  a2  C  1/j  +  (t.  -  t)2/i£(t  -  t)2]  +  oc2,  end 

V(Q.-W)  -<o2  +  o2/j)aw12/(aw1)2  +  (tf  -  t)2a2/ij(tj  -  t)2 

which  is  partitioned  and  follows  the  components-of-varlance  sense. 

CASE  III 

Let  the  deterioration  function  be  representable  by  a  linear  function, 

"u"  ■  +  *i  +  V  ♦  V  +  V 

or  by  a  non-linear  function  such  as 

«ij  •  V'i  +  l)/ri  +  *J> 

whose  covariances  in  both  cases  are  non-sero.  Further,  the  date  of  nanufacture 
of  each  batch  is  neither  the  sane  nor  is  the  distribution  of  manufacture  dates 
constant.  Finally,  neither  the  number  of  surveillance  periods  or  their  dates 
are  necessarily  the  sane  from  one  batch  to  another.  However,  the  variance  of 
assay  is  constant.  A  pictorial  representation  is  given  below. 


time 


A  continuation  o£  the  approach  shown  in  Cases  I  and  II  while  desirable  may  not 
be  tractable.  The  problem  is  not  so  much  tc  estimate 

Qi>  -  and 

q,,w  "  Ewifi<Cf)/Zwi 

which  are  readily  computable  as  to  formulate  expressions  of  their  partitioned 
variances  estimable  from  the  distributions  of  the  model  parameters  in  the 
sense  of  Cases  I  and  II. 


\ 
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A  MODEL 


FOR  OBTAINING  THE  OPERATING  CHARACTERISTICS 
OF  A  SKIP  LOT  SAMPLING  PROCEDURE 
Allen  C.  Endres 

US  Army  Ammunition  Procurement  &  Supply  Agency 
Joliet,  Illinois 


1.0  INTRODUCTION 

Project  SKIP  is  the  name  given  to  a  ballistic  testing  procedure  developed  and 
administered  by  the  Quality  Evaluation  Division  of  the  U.S.  Army  Ammunition  Procurement 

V 

and  Supply  Agency.  The  need  for  such  a  procedure  became  evident  when  a  study  of 
ballistic  testing  revealed  substantial  savings  could  be  effected  by  properly  lowering 
ballistic  test  frequencies.  The  development  of  the  methodology  required  to  obtain 
the  operating  characteristics  of  the  plans  covered  by  the  procedure  parallelled  its 
implementation  at  selected  loading  plants. 

Fig.  1  depicts  the  essential  steps  of  the  flow  diagram  of  Project  SKIP. 

The  associated  verbal  transition  matrix  is  contained  in  Fig.  2.  It  is  seen  that  we 
have  a  Markov  model.  Throughout  the  discussion  the  various  steps  of  the  flow  diagram 
will  be  referred  to  as  the  states,  i.e.  qualification  state,  restart  state,  etc. 

The  steady  state  occupancy  probability  cf  a  lot  being  in  state  i  will  be  Pj  ,  We 
shall  use  Pj  to  denote  the  probability  of  entering  state  i  on  the  next  step.  A 
step  is  defined  as  the  testing  of  the  lot. 
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2.0  METHOD  OF  DERIVING  Pi  AND 

We  shall  first  restrict  ourselves  to  the  case  where  only  tested  lots  are 
considered  and  temporarily  ignore,  the  skip  lot  possibilities  in  Step  One  and 
Step  Two.  Let 

P0  «  Prob  (being  in  qualification  state) 

PNi  -  Prob  (being  in  normal  step  one) 

PN2  a  Prob  (being  In  normal  step  two) 

Pjjl#  ■  Prob  (being  in  retrial  step  one) 

PN2*  “  Prob  (being  in  retrial  step  two) 

PR  ■  Prob  (being  in  restart  state) 

and  p  ■  Pr  (lot  meeting  all  ballistic  tests'  requirements  except  those  concerned 
with  critical  malfunctions) 

Y  *  Pr  (critical  malfunction) 
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Define:  Pgj  *  Probability  (being  in  restart  in  process  of  testing  for  five 
consecutive  lots  free  of  critical  malfunctions  while  disregarding  all  other  mal 
functions). 

Pr2  “  Probability  (being  in  restart  and  testing  for  ten  consecutive  lots 
meeting  all  requirements ) . 


We  shall  now  investigate  the  derivation  of  the  Pj,.  P^  ■  P~  (being  in  0  two 
steps  ago  and  rejecting  a  lot  for  reasons  other  than  a  critical  malfunction  on  the 
last  step)  +  Pr  (being  in  Nl*  two  steps  ago  and  rejecting  a  lot  for  reasons  other 
than  a  critical  malfunction  on  the  last  step)  +  Pr  (being  in  N2*  two  steps  ago  and 
rejecting  the  lot  for  reasons  other  than  a  critical  malfunction  on  the  last  step) 

-  P0  (1-p)  (1-y)  +  PN1*  (1-p)  (1-Y)  +  PN2*  (1-p)  (1-y) 

Utilizing  (i),  (4)  and  (5)  yields 


Similar  reasoning  for  PN1*  PNi*»  PN2 *  PN2*>  PR1*  Pr2  yields! 

(10)  P'x  -  P;  (1-Y)10  P10+  P^*  (l-Y)4  p1'  +  P^2  (1-y)10  P10 
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v 

T»  1 

*  Nl* 

i  t^  *.\  5  -  5 

-*•  v-*-  IV  K 

/I  *,\ 
V*  1  / 

l-(l--y)  p 

(12) 

p '  m 

N2 

2 *  P4  (1-^‘ 

*+  p  1 

+  *N1 

(13) 

p '  m 

fN2* 

'  Pi2  (i-p) 

(14) 

PR1* 

y 

(15) 

?R2  “ 

p' 

rRl 

(1-Y)5  +  P^2  (1-y) 

/I  tl  * 

'*  r/  *N1 


(l-y)5p5  +  P^2  (1-y)  p 


Equations  (9)  through  (15)  define  7  equations  in  terms  of  the  P^.  However  an 
additional  equation  is  needed  since  it  can  be  shown  that  the  coefficient  matrix 
is  not  of  full  rank.  The  needed  equation  is 


IK«1 
1  1 

It  was  found  convenient  to  solve  for  P^  and  then  relate  the  remaining  Pj  in  terms 
of  P^.  The  steady  state  probabilities  were  then  obtained  by  substitution  in  (1) 
through  (8) . 


3.0  INCORPORATION  OF  SKIP  LOT  POSSIBILITIES 

The  preceedlng  discussion  neglected  the  skipping  possibilities  in  and  N^. 

A  plausible  approach  to  the  skipping  anomaly  would  be  to  obtain  the  expected  ratio 
of  total  to  tested  lots  in  the  states  of  concern,  multiply  the  original  PN1  and  PN2 
by  these  ratios  and  then  force  the  modified  PN1,  PN2,  and  the  remaining  P^  to 
sum  to  one  by  normalization. 

Let:  Xi  -  Number  of  lots  tested  in  N^ 

X2  ■  Number  of  lots  skipped  in  N^ 


3A 


Then  P  (X2 «X^)  may,  for  a  given  Xx,  be  considered  as  the  expected 

number  of  no-tests  before  the  XjSt  test.  Hence  X2  is  distributed  as  a  negative 
binomial  random  variable  with  expectation 

where  q  is  the  probability  of  a  no-test 
p  is  the  probability  of  a  teBt. 
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A  completely  analagous  procedure  yields  a  ratio  of  3  for  Nj.  Hence  PN1  Is  multiplied 
by  2  and  PN2  by  3. 

The  total  procedure  is  then 

(1)  Obtain  Z  P. 

±1*  m1,m2 


(2)  Multiply  by  2  and  PN2  by  3  and  add  these  products  to  (1). 

(3)  Divide  PQ,  PR,  PNj*»  PN2*»  by  cbe  sum  obtained  in  (2);  also  divide 
2,PN1  and  3  PN2  by  that  sum. 

(4)  Each  quoltlent  obtained  in  (3)  la  defined  as  P*. 

and  IP*  -  1. 
i  1 

4.0  DERIVATION  OF  ACCEPTANCE  PROBABILITY 
Let: 


(I) 


Expected  proportation  of  accepted  lots 

j  <pti  +  p,i>  p? 

1  Pr  (lot  tested  and  accepted  in  state  i) 

>  Pr  (lot  skipped  and  hence  accepted  in  state  i) 


where: 


Refprr-lno  tc-  the  f  lev  vlia#».aiu  we  have: 


STATE 


1* 


’2* 


lii 

p(l-Y) 

0 

1/2  p(l-Y) 

1/2 

1/3  p(l-Y) 

2/3 

p(l-Y) 

0 

p(i-y) 

0 

p(1-y) 

0 

which  may  be  seen  to  yield: 

-  pd-Y)  h  +  ♦  nii.+  p*„  + 1/2  %  +  % 


*7  I  +  ^N1 

o 


2PS2 


This  formula  together  with  an  assumed  Y  ■  .0002  was  used  to  obtain  Figure  3. 


5 . 0  EXPECTED  REDUCTION  IN  TESTING 

The  expected  reduction  in  testing  isi 


jl!i+  2  pfoj V8t  p> 


the  probability  of 


the  lot  meeting  all  ballistic  tests'  requirements  not  concerned  with  critical 
malfunctions.  The  asymptote  is  the  maximum  reduction  possible  for  y  ■  .0002,  and 
was  obtained  by  finding  lim  (P*  ,  P*  ) . 

p*l  N1  N2 
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FIGURE  TWO 
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A  MODEL  FOR  DETERMTWTWrt  QUALITY 
INCENTIVE  PAYOFFS  FOR  PROCUREMENT 


Roger  Jiymer  and  Eugene  Dutoit 
Picatirmy  Arsenal 
Dover,  New  Jersey 

INTRODUCTION .  The  purpose  of  this  paper  is  to  formulate  a  model  for  the 
preparation  of  Quality  Incentive  Clauses  to  he  included  in  Government  contracts. 
The  model  will  concern  itself  with  those  items  which  are  procured  according 
to  acceptance  criteria  involving  single  sampling  plans  by  attributes. 

A  Quality  Incentive  Clause  is  an  addition  to  a  supply  contract  which  is 
designed  to  benefit  both  the  contractor  and  the  Government.  Hie  clause 
provides  for  the  payment  of  a  bonus  to  the  contractor  if  product  quality  is 
above  that  designated  as  acceptable  in  the  product  specification. 

Changes  in  product  quality  will  be  observed  by  selecting  one  or  more 
parameters  which  reflect  item  effectiveness;  changes  in  the  "relative  AQL" 
of  these  parameters  with  respect  to  the  AQL's  outlined  in  the  product  specifica¬ 
tion  will  be  used  to  indicate  differences  in  quality  level. 

Finally,  variations  in  AQL  will  be  combined  with  a  payoff  factor  to  assign 
a  partial  payoff  for  each  parameter.  This  payoff  factor  is  designed  to  adjust 
for  the  relative  importance  of  each  parameter  as  well  as  the  magnitude  of  the 
quality  measurement.  The  sum  of  the  partial  payoffs  will  indicate  the  total 
payoff  to  which  the  contractor  is  entitled. 

THE  GENERAL  MODEL.  This  section  represents  an  outline  of  the  general 
model  proposed  for  formulating  Quality  Incentive  Clauses.  A  brief  explanation 
of  each  of  the  major  segments  of  the  model  is  presented  below.  A  more  elaborate 
discussion  of  the  development  of  each  of  these  segments  will  be  presented  in 
a  later  section. 


The  remainder  of  this  paper  was  photographically  reproduced  from  the  author's 
copy. 
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1*  XL  cm  Cunirul  roraiucLers »  —  These  parameters  are  used,  to 
measure  the  "incentive  quality"  of  the  item. 

2.  Ratio  Weights. -Weights  are  assigned  to  all  control  parameters 
to  indicate  the  relative  importance  of  each  parameter  in  determining 

item  effectiveness. 

3.  Maximum  Payoff. -This  value  represents  the  maximum  amount 
the  purchasing  agent  is  willing  to  pay  for  quality  in  the  item. 

1*.  Payoff  Factors  for  Each  Control  Parameter. -The  payoff  factor 
is  a  multiplier  which  transforms  a  given  quality  measurement  into  an 
incentive  payoff.  It  is  designed  to  reflect  both  the  magnitude  and 
importance  of  measured  quality  for  each  parameter. 

5.  Partial  Payoffs  for  Each  Control  Parameter. -The  partial 
payoff  is  e.  measure  of  that  portion  of  final  payoff  which  is  attributable 
to  each  control  parameter, 

6.  Total  Payoff. -This  value  represents  the  bonus  payable  to  the 
contractor  on  the  basis  of  the  Indicated  quality  of  the  item. 

DEVELOPMENT  OF  THE  GENERAL  MODEL 

This  section  traces  the  development  of  the  various  segments  of  the 
general  model.  Each  major  segment  of  the  model  is  expanded  and  quantified 
according  to  the  basic  assumptions  of  the  model. 

Selection  of  Control  Parameters. -Although  many  parameters  may  contribute 
to  the  performance  of  a  particular  item,  it  is  desirable  to  select  only 
a  few  parameters  to  measure  quality  for  incentive  purposes.  One  or  two 
parameters  are  ideal;  any  more  than  three  may  be  unwiedly  and  impractical. 
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The  parameters  selected  should  he  those  which  most  clearly  define  item 
effectiveness  under  operational  conditions.  Consequently,  in  addition 
to  minimizing  the  number  of  parameters  selected,  care  must  also  he 
taken  to  insure  that  all  parameters  which  indicate  effectiveness  are 
included.  Thus,  the  number  of  parameters  selected  should  he  as  re¬ 
strictive  as  possible,  yet  comprehensive  enough  to  include  all 
significant  parameters. 

Furthermore,  it  is  important  that  parameter  measurements  be 
compatible  with  acceptance  tests  as  outlined  in  the  product  specification. 
Parameters  which  require  increased  sample  size  or  additional  testing 
in  order  to  be  measured  satisfactorily  are  not  desirable. 

Assignment  of  Weights. -Weights  will  be  assigned  to  each  control  parameter 
in  multiples  of  ten  within  the  range  0  to  100  (10,20,30  10O),  For 

example,  consider  a  situation  involving  two  parameters  where  it  is  felt 
that  parameter  A  is  1  1/2  times  as  important  as  parameter  3.  The  weights 
assigned  would  be  Wa  *  30;  W-^  *  20  or  Wa  «  6 0;  Wb  *  40.  As  long  as  the 
ratio  is  maintained  it  does  not  matter  which  combination  of  weights  is 
selected. 

Determination  of  Maximum  Payoff , -The  maximum  payoff  (MP0)  is  selected  as 
that  percentage  of  unit  price  the  purchasing  agent  is  willing  to  pay  if 
maximum  incentive  quality  is  obtained  in  all  parameters. 

Determination  of  Payoff  Factors. -On  the  basis  of  the  subjective-objective 
decisions  outlined  above,  a  PAY  OFF  FACTOR  (POF^)  is  determined  for  each 
parameter.  The  POFj  is  a  function  off 
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(1)  the  individual  weight  of  each  parameter  relative  to  the 
combined  weights  of  all  parameters, 

(2)  maximum  pay-off  allowed  in  percent  unit  price. 

Development  of  POF^; 

Initially  define  a  "quality  point"  as  a  measure  of  incentive 
quality  which  will  give  a  pay-off.  In  order  to  achieve  the  maximum 
pay-off  (MPO)  incentive,  the  contractor  must  achieve  the  maximum 
quality  points  (MQP)  which  have  been  assigned  for  each  parameter. 

Assuming  a  linear  model  where  Zero  "QP"  would  give  Zero  Pay-off,  the 
relationship  between  Pay-off  (PC)  and  "QP"  can  be  shown  aB  figure  1  below: 


Since  each  parameter  is  weighted  in  itB  importance  to  item  effective¬ 
ness,  it  shall  be  defined  that  the  maximum  QP  for  each  parameter  be  equal 
to  the  weight  assigned  to  each  parameter  (ie,,  if  Wa  «  Uo,  ■  20; 

Wa  ■  2Wfc  -  therefore  parameter  Wa  is  twice  as  important  as  and  will 

receive  twice  the  number  of  quality  points). 

So,  on  an  item  basis: 

(l)  MQP  ■  the  sum  of  the  weights  for  the  parameters  considered  for 
that  item. 

MQP  *>  WT  where  N  is  the  number  of  parameters  considered. 


I 

1 
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(2)  MPO  »  Percentage  of  item  unit  price. 


Therefore  Figure  1  becomes: 


Figure  2 

Parameter  (I)  vith  weight  Wj  gives  a  pay-off  factor  (POF^)..  POFj  is. 

a  proportional  part  of  MPO  as  expressed  below: 

POF,  »  (MPO)  -  (0)  (WT) 

(V?)  '  -  (0) 

POFj  =  MPO  (WT) 

1  WT 

Where  POFj  is  a  percentage .In  a  fractional  form: 

POFj  -  (MPO)  (Wj) 

TwyT  loo"  (1) 

A  verification  of  the  relationship  is  given  in  the  appendix. 


Determination  of  Partial  Payoff .-Using  the  payoff  factor  and  the  percentage 
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(2) 


The  partial  yay-vii  for  Parameter  I  (PPO^- )  is  represented  as: 

PPOj  =  (POFj)  (?  difference) 
where  FPOj  is  a  percentage  of  unit  price. 

Determination  of  Total  Payoff. -It  follows  that  the  total  payoff  (TPO) 


is  the  sum  of  all  partial  payoffs  or 
Total  Pay-off  (TPO)  S])]PP0I 

If  we  have  100?  difference  in  AQL  for  each  parameter;  then 
TPO  (?)  *  MPO  (?)  -  see  Appendix  for  verification. 


(3) 


"MEASURING"  INCENTIVE  QUALITY 
WHEN  THE  ACCEPTANCE  NUMBER  OP  ALL  PARAMETERS 
IS  FIVE  OR  GREATER 

Incentive  payoffs  will  he  made  on  the  hasis  of  changes  in  quality. 
For  this  model,  these  ohanges  will  he  measured  in  terms  of  AQL.  This 
measure  is,  in  fact,  a  psuedo-AQL  (AQLp). 

As  stated  in  the  previous  section,  incentive  quality  is  indicated 
by  the  percentage  difference,  AQLB  is  the  AQL  for  a  parameter  outlined 
in  the  product  specification.  For  brevity,  AQLfl  is  presented  in  the  form 
of  the  appropriate  sampling  plan  as  follows: 

(AQLS  |  code  letter;  n,  x,  x  +  l)  where  x>5  (*0 

and:  n  ■  sample  size 

x  ■  acceptance  number 
x  +  1  «  rejection  number 
All  data  is  in  accordance  with  MIL-STD-105D, 


AS 


When  sampling  is  conducted  according  to  the  specification  sampling 
plan  and  the  number  of  defects  in  the  observed  sample  is  some  x’<x,  then 
for  convenience  AQLp  is  defined  as  follows: 

(AQLp  |  code  letter;  N;  x',  x*  +  l)  (5) 

code  letter  is  the  same  as  in  (4) 


where:  n  *  sample  size 

x’  »  acceptance  number 
x*+l  *  rejection  number 

All  data  is  in  accordance  with  MIL-STD-105D.  AQLp  can  be  determined 
using  MIL-STD-105D,  a  Thorndike  Chart  or  Poisson  Tables. 

It  is  important  to  point  out  that  the  AQLp  does  not  mean  that  the 
process  average  is  actually  equal  to  the  AQLp.  The  AQLp  is  a  "dummy"1 
measure  of  quality.  It  merely  says  that  — 

if  a  sampling  plan  had  been  used  with  code  letter  e(\  sample 
size  N,  decision  criteria  x',  x1  +  1  -  then  the  AQL  associated  with  this 
plan  is  AQLp.  It  Is  the  AQL  of  the  sampling  plan  that  has  Just  been 
passed. 


The  pseudo  value  is  used  in  the  incentive  model  to  compute  the 
percentage  change  In  AQLe  or  "the  change  in  quality”. 

The  percentage  difference  ( JfD )  between  AQLfl  and  AQLp  is  computed  by: 


49 


EXAMPLe,  Or  DExrjnr'Ui’UWii  AQL  AND  %  D,  WHEN 

P  I 

THE  ACCEPTANCE  NUMBER  OF  ALL  PARAMETERS  IS  FIVE  OR  GREATER 

Simple  Case  -  one  parameter  considered. 

Example:  Consider  a  parameter  vith  AQLS  as  follows  (l.0|  M;  315;  7, 

8). 

If  in  sampling  the  number  of  defects  observed  is  3  then  x'  ■  3. 
Hence,  AQLp  is  defined  according  to  equation  (5)  as  (AQLp  |  M;  315;  3, 

U). 

Using  MIL-STD-105D,  AQLp  -  .Uo 

Therefore.:  JtDj  *  AQLa  -  AQLp  100 

AQLs 

-  (1.0  -  ,U0)  100 

1.0 

■  60% 

COMPLETE  EXAMPLE  FOR  COMPUTING  INCENTIVE  PAYOFF 
IN  WHICH  THE  ACCEPTANCE  NUMBER  OF  ALL  PARAMETERS 
IS  5  OR  GREATER 

Two  significant  parameters,  A  and  B,  have  been  Beleoted  for  the 
item  in  question.  Subjective  Judgment  indicates  that  Parameter  A  is 
1-1/2  times  as  Important  as  Parameter  B. 

Step  Information  Value  Obtained  Explanation 

1  Weight  for  Parameter  30 

A:  Wa 

2.  Weight  for  Parameter  20 

B:  Wb  • 

Sum  of  Weights  (Wq>): 

Wa  +  Wb 


3 
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30  +  20 


Step 

4 

5 


6 


Step 

7 


8 


Information 

Value  Obtained 

Explanation 

Maximum  Payoff:  MP0 

10$ 

Subjective 

Payoff  Factor  A  (P0Fa): 

(MP0)  (Wa) 

.06 

(10)  (30) 

Twrl  100 

(50)  '  (100) 

Payoff  Factor  B  (P0Fb): 

(MP0)  (Wb) 

,ou 

(10)  (20) 

(Wt)  100“ 

(50)  (lOO) 

The  incentive  clause  indicates  that  two  parameters  A  and  B 


will  be  used.  Parameter  A  has  an  AQLS  of  1,0$,  P0Fa  =  .06 
Parameter  B  has  an  AQLS  of  .65$,  P0Fb  ■  .01*.  The  size  of 
the  lot  for  which  a  payoff  is  to  be  calculated  is  15,000. 
Oeneral  Inspection  level  II  is  to  be  used. 

The  number  of  defectives  found  in  the  sample  for 
Parameter  A  was  3.  (x1  ■  3) 

The  number  of  defectives  found  in  the  sample  for 
Parameter  B  was  2.  (x'  ■  2) 

Information  Value  Obtained 

Sampling  Plan  Code  Letter  M 

Parameter  A?  AQLS;  SS 
(x,  x+1)  (1.0$;  315  (7,8)) 

Parameter  B:  AQLS;  SS 
(x,  *-1)  ^5$;  315  (5,6)) 

Pseudo  AQL 

Parameter  At  AQLp  .1*0$ 

Parameter  Bt  AQLp  .25$ 


Explanation 


Product 

Specification 


Product 

Specification 


105-D 
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Information 


Value  Obtained 


Explanation 


Ste£ 

9  Percentage  Difference  Oft)): 


(AQLa  -  AQL,,)  (100) 
AQLS 


Parameter  A:  to 

60% 

(i.o  -  .Uo)  ioo 
1.0 

Parameter  B:  to 

62% 

{.65  -  .25)  100 

.85 

10 

Partial  Payoff  (PPO)  = 

(to)  (POP): 

Parameter  A:  PPOa 

3.6% 

(60)  (.06) 

Parameter  B:  PPOfc 

2.5% 

(62)  (.01*) 

11 

Total  Payoff  TPO  -  PP0a 

+  PPOt: 

6.1% 

3.6  +  2.5 

"MEASURING"  INCENTIVE  QUALITY 
WHEN  THE  ACCEPTANCE  NUMBER  OP  AT  LEAST  ONE 
PARAMETER  IS  LESS  THAN  FIVE 

Choose  the  oontrol  parameter  which  has  a  sampling  plan  where  X 
la  minimum  less  than  five.  The  general  approach  to  the  problem  will 
be  to  determine  from  the  requirements: 

( AQLg  |  code  letter;  N;  X,  X+l)  where  X<5.  A  second  sampling 
plan  will  be  defined  as: 

(AQLa  |  N';  Xa,  Xa  +  l)  (7) 

where  Xa">X  and  N'/N  is  some  whole  number  greater  than  one  which 
represents  the  cumulative  number  of  lots  that  have  to  be  sampled 
before  an  incentive  pay-off  decision  can  be  made.  The  conditions  of 
equation  (7)  can  be  satisfied  by  use  of  a  standard  Thorndike  Chart  or 
Summation  of  Terms  of  Poisson^  Exponential  Binomial  Limit. 
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Example 

consider  the  specification  sampling  plan: 

(AQLb)  «  0.25$  |  N  -  315;  2,3) 

The  AQL  can  also  be  expressed  as  fraction  defective: 

(AQLs  *  .0025  |  N  *»  315;  2,3) 

If  an  equivalent  5-6  plan  (X  »  5)  is  required,  a  Thorndike  Chart 

61 

can  be  used.  Defining  the  possibility  of  acceptance  at  the  AQL  to  be  .95 

N'  (AQL  expressed  aa  fraction  defective)  ■  2.6 
or  N'  (.0025)  «  2.6 
N’  «  lOlO  items 

The  AQL  could  have  also  been  written  as  a  percentage 

N'  (AQL  ($))  -  260  (8) 

N'  -  lOkO 

The  equivalent  sampling  plan  expressed  as  equation  (7)  is: 

(AQL  -  .25$  |  N  «  1040;  5,6) 

In  general,  equation  (8)  can  be  written  for  both  the  specification 
sampling  plan  (N,  X,  X  +  l)  and  the  second  equivalent  sampling  plan 
(S'  ,  Xft,  Xg  +  1); 

(i):  (N)  (AQLs.)  ■  (Poisson  factor  X,  X  +  1)  100 

■  (I7xt  x  +  1)  100 

(ii):  (N)  (AQL  )  -  (Poisson  factor  X. ,  X,  +  l)  100 

B  l>  R 

■  (fTx x.  +  i)  ioo 

ft  & 


(ftx.  x„  +  1)100 

la  8» 


AQLfl 
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or 


N'  *  (  llXa.  Xa  +  l)  100 
N  (‘jLj'X,  X  +  1)  100 

From  equation  (9): 

(  I  lx,  X  +  1)  100  «  (N)  (AQLS) 

So  that 

N'  »  ( Hxa,  xa  +  1) 

N  ( N )  ( AQLS )  "  (10) 

For  convenience,  modified  Poisson  factorB  for  all  Xa,  Xa  +  1  can 
he  derived  for  a  probability  of  acceptance  of  .95.  The  numerator  of 
equation  (10)  can  he  written  as: 

(  llxa,  Xa  +  l)  100  «  Zj,  Equation  (10)  now  becomes: 

N'  -  Zi _ 

n  TiH  CaolT)  U1) 

Values  of  Zj.  for  attribute  sampling  plans  X,  X  +  1  are  given  below 


in  Table  I 

X.  X  +  1 

TABLE  I 

Zi 

5-6 

260 

6-7 

330 

7  -  8 

1*00 

8-9 

470 

9-10 

540 

10  -  11 

620 

11  -  12 

700 

12  -  13 

780 

13  -  14 

850 

54 


x  +  i 

Zi 

1  k  _  15 

930 

15  -  1 6 

1000 

16  -  17 

1070 

17  -  18 

1170 

18  -  19 

1250 

19  -  20 

1320 

20  -  21 

lbOO 

By  letting  L  equal  the  number  of  cnmlative  lots,  such  that  N'/N  is 
a  whole  number: 

L  ■  li  ■  Zi _ 

(N)  ( AQLa /  where  the  value  of  lj  which  ie  closest 

to  a  whole  number  is  chosen  as  the  value  of  L. 

Example: 

Consider  the  specification  sampling  plan: 

(AQLa  -  ,2551  IN  -  315j  2,3) 

In  this  case: 

(N)  (AQLS)  -  (315)  (.25)  -  78,75 
Therefore,  applying  equation  (ll)  and  Table  It 


15-6 

■  260 

7^75 

»  3.30 

^6-7 

■  M 

78775 

■  b.20 

17-8 

-  bOO 

■  5.08 

705 

18-9 

■  ^70 
7S775 

■  5.97 

which  is  nearly  a  whole  Integer  and  corresponds  to  a  8-9  plan.  Therefore 
L  ■  6  lots  will  be  accumulated  before  an  incentive  pay-off  decision  will 
be  made.  If  6  lots  of  sample  size  315  each  are  accumulated,  then  the 
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adjusted  sample  size  Is: 

N'  -  (L)  (N)  (12) 

or  H«  «  (6)  (315)  *  1890 

In  summary  -  the  original  sampling  plan  is: 

(AQL  »  .25%  I  315}  2,3) 

This  is  replaced  with  an  equivalent  plan  by  accumulating  6  lot 
samples  defined  in  accordance  with  equation  (7): 

(AQL  *  .25%  11850;  8,9) 

Extention  of  this  example  by  computing  AQLp  and  Payoff: 

If  6  defects  were  encountered  (X*  *  6,  X'  +  1  *  7)  in  6  lots  of 
sampling  (total  N  ■  1890),  the  pseudo  AQL  (AQLp)  can  be  determined  as 
follows.  The  definition  of  AQLp  is  the  same  as  in  the  previous  section  if 

(AQLa  -  .25%  1  1890}  8,9) 
then  (AQLp  *  7  I  1890;  6,7) 

It  is  known  that 

(N)  (AQLS)  -.(Zi  for  X,  X  +  1  plan).  If  X,  X  +  1  and  N  are 
known,  then  AQLS  can  be  determined.  This  is  also  true  for  AQIp.  In  this 
example,  (X»,  X'  +  l)  *  16,7)  and  N'  ■  18 90.  Applying  equations  (9)  to  this 
situation: 

(N»)  (AQLp)  -  (Zi  for  X*.  X'  +  1  plan) 

(N1)  (AQLp)  «  (Zi  for  6,7  plan) 

(1890)  (AQLp)  -  330 


The  icoulituii  percentage  airrerence  (>d)  between  AQLs  and  AQLp  is: 

3<Di  =  (.250  -  ,175)  100  »  30* 

.250 

Note:  These  adjustments  must  be  made  In  the  sampling  plans  of  all 
(pertanent)  item  control  parameters.  The  following  example  (although 
repetitious  in  part)  will  include  the  complete  computation  as  well  an 
the  TPO. 


A  COMPLETE  EXAMPLE  FOR  CALCULATING  PAYOFF  WHEN 
THE  ACCEPTANCE  NUMBER  OF  AT  LEAST  ONE  PARAMETER  IS  LESS  THAN  5 


Parameter  A:  N  «  32,  (0  -  1);  AQLS  ■  ,1*0;  WA  ■  30 
B:  N  «  50,  (1-2);  AQLe  ■  1.00;  Wfc  ■  20 
MP0  «  10* 

WT  »  WA  +  WB  a  50 

P0Fa  a  (MP0)  (Wa)  ■  (10)  (30)  -  .06 
(wt)  ioo  (50)  (100) 

POF-b  =  (MP0)  (WB)  “  (10)  (20)  -  .01* 

(WT)  (100)  (50)  (100) 

j 

Because  both  of  the  sampling  plans  have  acceptance  numbers  less 
than  5*  the  number  of  cumulative  lobs  (samples)  must  be  determined 
in  order  to  determine  equivalent  sampling  plans: 


h  •  Zi _ 

(N)  (AQLb) 

Plan  A  has  X  as  a  minimum  (X  a  0): 


(N)  (AQLS)  -  (32)  (.10 
Referring  to  Table  I: 
h  «  260  «  20.31 

10 


l2  -  330 

12TB 


25.78 


12.8 


Xl+  -  1*70 

10  "  36.72 

x5  -  5k0  . 

]0  -  *2.19 


*3  ■  1*00 
12$ 


31.25 
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arbitrarily  stop 


2  ■  25.78  is  nearest  to  a  whole  number,  »(26),  therefore  L  *  26  lots 
will  be  accumulated.  If  26  lots  of  sample  size  32  are  accumulated,  the 
adjusted  sample  size  N'  is: 

N'  ■  (L)  (N)  -  (26)  (32)  =  832 
In  summary  -  the  original  plan  is: 

(AQLS  *  ,40  I  32;  (0  -  l)) 

But  Zg  of  330  (see  Table  I)  corresponds  to  a  6  -  7  plan  (i.e.;  Xa  *  6, 
XB  +  1  «  7)  so  that  the  revised  plan  is: 

(AQLg  ■  ,40  |  832;  (6  -  7))  for  Parameter  A 
An  appropriate  adjustment  must  be  made  for  Parameter  B: 

The  original  plan  for  Parameter  B  is: 

(AQLS  -  1.00  |  50;  1  -  2) 

Since  k  *  26  lots  will  be  accumulated,  the  adjusted  sample  size  N*  is: 

N»  ■  (L)  (N)  -  (26)  (50)  -  1300 
Since: 

(N)  (AQL-)  -Z* 

(1300)  (1.00)  -  Zi 

Zj  *  1300 

Reference  to  Table  I  shows  that  Zj  «  1320  corresponds  with  a  19  -  20 
plan.  The  revised  plan  for  Parameter  B  is  therefore: 

(AQLS  -  1.00  1 1300;  19  -  20) 

In  actual  sampling  the  following  defects  were  counted: 

Parameter  A  -  5  defects 
Parameter  B  -  10  defects 
The  Psuedo  AQL8  are  ccmputed: 

For  Parameter  A: 

(N)  (AQLp)  -  Z±  (V  ■  5,  X*  +  1  =  6  for  5  -  6  plan) 

(832)  (AQLp)  -  260  from  Table  I 
AQLp  -  .31* 
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For  Parameter  B: 


(n)  (AQL-)  *  Z±  (X*  =  10,  X*  +  1  =  11  plan) 
(1300)  (AQLp)  ■  620  from  Table  I 
AQLp  a  .1*82 

Therefore: 

*Dn  «  (.40  -  .31)  100  *  22.5* 

— 7E o - 

*Db  -  (1,00  -  .48)  100  «  52.0* 

1.00 


The  Partial  Pay-off a  for  each  Parameter  are: 

PP0a  =  (FOF.)  (*Da) 

=  (.06)  (22.5*) 

■  1.35* 


PPOt  -  (POFh)  (*Db) 

■  (.04)  (52*) 

=  2.08* 

Therefore  the  total  pay-off  awarded  to  the  contractor  after  26 
lots  were  produced  and  sampled  was: 


TPO  «  PP0a  +  PPOt 

■  1.35*  +  2.08* 

■  3.43*  of  unit  price 


CONCLUSIONS 

Although  some  effort  has  been  expended  in  Investigating  the  development 
of  quality  incentive  payoffs  it  is  believed  that  this  paper  makes  a  significant 
contribution  in  the  area.  This  contribution  is  evidenced  by  the  investigation 
and  extension  of  previously  formulated  concepts  and  the  synthesis  of  standard 
atatistical  techniques,  In  particular,  an  effort  is  made  to  make  provision 
for  the  following  common  situations  which  normally  occur  in  actual  acceptance 
sampllne  plans: 

1-  Items  having  several  parameters  which  contribute  to  overall  effectiveness 
in  varying  degrees. 

2.  Items /parameters  with  acceptance  sampling  plans  specifying  small 


■rcep tence  numbers  (i.c.;  C  > )  vliiuli  inherently  iacx  an  expanded  range  of 
quality  measure  (AQL^  and  JtD ) . 

3.  Items /parameters  specifying  sampling  plans  with  an  acceptance 
number  equal  to  zero. 

Provisions  for  the  above  situations  were  established  by  defining  a 
procedure  for  selecting  and  weighting  item  parameters  relating  to  effectiveness. 
Furthermore,  quality  incentive  pay-off  decisions  for  sampling  plans  requiring 
small  acceptance  numbers  were  incorporated  into  the  model  by  cumulating  the 
results  of  severed  product  lots. 

The  procedures  presented  in  this  paper  are  not  considered  to  contribute 
a  sophisticated  approach  to  formulating  quality  incentive  plans.  Intuitively, 
the  basic  philosophical  framework  is  believed  to  be  workable,  however  the 
overedl  model  should  certainly  lend  itself  to  further  refinement  and  simplica- 
tion.  Some  restrictive  features  of  the  model  which  would  be  adaptable  to 
future  work  are: 

(1)  Restricted  to  single  sampling  by  attributes. 

(2)  Limited  to  simple  functioning  items. 

(3)  Considerable  subjective  Judgment  involved  in  selection  and 
weighting  of  parameters. 

(4)  Loss  of  incentive  impact  due  to  complexity  of  special  procedures 
for  cases  in  which  C  <. 5. 

(5)  Undesirable  time  factor  due  to  lot  accumulation  when  acceptance 
number  is  very  small. 

(6)  Rounding  error  in  ZjValues. 


SPDTPMTiTV 


I.  Property 


^POFj  =  MPO  (in  terms  of  percentage) 


Demonstration; 


Wt  ■  Wi  +  W2  +  ...  +  Wj  +  ...  +  Wn  - 

Therefore: 

~  +  —  +  *••  +wl  +  •••  +%■! 

Wt  Wt  Wt  Wt 

It  has  been  established: 

POFi  •  MPO  (Wi)  (in  terms  of  percentage) 

*  *  Wt 

^POFi  my^po  (Wl) 

Wt 

Expanding  the  summation: 

^POF  ■  MPO  f"  (Wi  +  Wg  +  ...  +  Wi  +  ...  +  Wn) 

w  L(Wt  Wt  wt  Wt) 

or  >POF  ■  MPO 
II.  Property: 

If  tho  percent  difference  between  the  specification  AQL  and  the 

pseudo  AQL  is  100  percent,  then: 

TPO  ■  MPO  (in  peroent) 

\ 

Demonstration; 

TPO  (%)  •  2pp0i 

■  JJ(POFi)  (#  difference) 

■  2J(pofi)  (100*) 

where  POFl  is  a  fraction,  therefore} 

TPO  (%)  ■  ^*(P0Fi)  (in  percent) 
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But  it  has  been  shown  in  T  that: 

^jPOFx)  s  MPO 

Therefore : 

TPO  (%)  n  mpo  (%) 
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urriMUM  SAMPLING  PLANS  FOR  GRADING  BINOMIAL  POPULATIONS 


Paul  B.  Nickeus 

U,  S.  Army  Ballistic  Research  Laboratories 
Aberdeen  Proving  Ground)  Maryland 

INTRODUCTION  AND  BACKGROUND.  In  the  surveillance  evaluation  of 
ammunition  an  important  task  is  that  of  grading  lots  on  the  basis  of 
attribute  characteristics  of  a  sample  drawn  from  larger  populations.  At 
the  present  time,  lots  are  placed  into  one  of  three  grades  based  on  the 
performance  of  a  random  sample  of  n  items  chosen  from  the  lot.  It  is  of 
obvious  importance  that  the  probability  of  misgrading  a  lot  based  on  this 
sample  be  made  a  minimum.  The  basis  for  the  current  grading  procedure  is  a 
BRL  report  written  by  Mr.  A.  Golub  entitled  "The  Determination  of  Acceptance 
Numbers  for  Placing  a  Lot  from  which  a  Single  Sample  is  Drawn  into  One  of 
Three  Grades"  published  in  1951.  In  this  report,  Mr.  Golub  maximises  the 
probability  of  correct  grading  by  differentiating  expressions  of  the  following 
type  (?)  pi  qn_1,  setting  the  resulting  values  equal  to  sero  and  solving 
for  the  c  values  (acceptance  number*). 

Mr.  Golub's  report  serves  as  s  basis  for  ths  following  paper  in  which 
a  diffarant  mathod  of  maximizing  tha  probability  of  correct  grading  ia 
developed.  A  generalised  solution  is  given  and  tables  are  developed  for 
lot  classification  into  2,  3,  or  4  grades. 

THEORETICAL  DISCUSSION.  In  determining  the  original  acceptability  of 
large  quantities  of  manufactured  products  or  in  chacking  ths  raliablllty  of 
Items  which  havs  baan  in  storage  for  sons  time,  groupa  of  the  product  are 
submitted  for  inapection  (tasting)  in  divisions  called  lots.  These  lote 
can  often  be  characterized  by  a  certain  property,  or  set  of  properties  of 
tha  individual  members  of  the  lot.  For  example,  a  population  of  artillery 
projectllaa  can  be  divided  into  those  which  ere  defective  and  those  which 
ere  not,  e  group  of  waahar  fittings  can  ba  divided  into  those  which  fit  a 
five-inch  setting  and  those  which  do  not.  We  let  x  ba  a  random  variable 
which  assumes  tha  value  0  if  an  individual  in  tha  lot  has  none  of  ths 
characterising  properties  and  1  if  tha  Individual  possesses  one  or  more  of 

This  article  wae  reproduced  photographically  from  the  manuscript  submitted  by 
the  author. 
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ft'»“  properties .  If  v«  uu»  1*1  y-r(x-i),  then  tne  lot  la  defined  to  be  a 
lot  with  fraction  defective  p  and  an  Individual  which  exhibits  one  or  more 
of  the  characterizing  properties  is  called  a  defective  item. 

In  dealing  with  large  lota,  it  is  frequently  too  expensive  or  time 
consuming  to  examine  or  test  each  item  in  the  lot.  (In  ffict,  where  the 
procedure  calls  for  the  destruction  of  the  item,  it  Is  impossible  to  inspect 
every  item.)  Thus,  soma  type  of  sampling  inspection  plan  must  be  devised. 

One  of  the  more  common  types  of  sampling  plans  is  the  so-called  single- 
sampling  plan  where  the  consumer  selects  a  random  sample  of  size  n  from 
tha  lot  and  if  the  number  of  defective  items  in  the  sample  is  less  than 
or  equal  to  a  given  number  c,  the  lot  is  accepted  and  if  c  +  1  or  more 
defectives  la  found  in  the  sample,  the  lot  is  rejected. 

This  concept  can  be  readily  extended  to  situations  involving  classification 
of  a  lot  into  more  than  two  classes,  say,  three,  four,  or  any  number  up  to  k 
classes  or  grades.  Let  us  assume  that  for  each  of  the  k  grades,  an  interval 
has  been  determined  such  that,  if  the  lot  fraction  dafactiye  is  in  this 
interval,  the  lot  belongs  to  that  grade.  These  intervals  or  levels  can  be 
determined  by  a  review  of  the  specifications  for  the  item  or  by  considering 
the  requiremsnra  established  by  the  user  or  consumer  for  the  reliability  of 
the  item. 

Now,  let  ua  suppose  for  convenience,  that  our'  stockpile  consists  of 
exactly  100  lots  which  have  the  corresponding  fraction  defectives; 
yQ  «  0,  y1  ■  .01,  y2  -  .02,  ...  y9fl  -  .98,  y^  ■  .99.  One  of  theea  lots  la 
•alactad  at  random  and  submitted  to  our  sampling  plan.  We  1st  p  ba  the  lot 
fraction  defective  for  this  lot. 

We  now  went  to  piece  this  lot  into  one  of  k  grades  in  accordance  with 
the  following:  if  the  lot  fraction  defective  is  less  then  p.^  (0  1  p  i  p^), 
the  lot  is  of  Grade  A  quality;  if  the  lot  fraction  defective  la  between 
and  p2  (p^  <  p  <  p2>,  the  lot  la  of  Grade  B  quality;  if  the  lot  fraction 
defective  is  bstwsan  pT.  and  p}  (p^  <  p  j  pj,  the  lot  ie  of  Grade  C  quality 
and  ao  on,  out  to  the  final  grade,  that  ia,  if  the  lot  fraction  defective  la 
more  than  pfc  (p^  <  p  S  1),  the  lot  ie  of  Grads  K  quality. 


Our  plan  now  calls  for  selecting  a  random  sample  of  r.  items  from  the 
lor.  inspecting  (Leo ting)  each  Item  in  the  sample  and  determining  the  number 
of  items  (r)  which  are  defective.  The  lot  will  then  be  placed  into  one  of 
the  k  grades  using  the  following  rule: 

If  0  <  r  <  Place  the  lot  in  Grade  A 

If  +  1  <  r  <  c2  Place  the  lot  in  Grade  B 

If  Cj  +  1  <  r  s  Cj  Place  the  lot  in  Grade  C 

e  • 

•  • 

•  * 

If  c^_,  +  1  <  r  <  n  Place  the  lot  in  Grade  K 

Under  this  set  of  conditions,  we  can  use  the  formula  of  total 

probability*  to  calculate  the  probability  of  placing  a  given  lot  into  its 
proper  grade,  or  in  other  words,  we  are  determining  the  probability  of 
correctly  calling  a  Grade  t  lot  its  actual  grads.  Grade  t.  This  gives 

P  ■  P  (of  piecing  the  lot  in  the  correct  grade)  - 

P{p  -  0)  P{0  <  r  <  cj  p  -  0)  +  P(p  -  .01)  P{0  <  r  <  Cj  |  p  ■  .01}  + 

...  +  P{p  ■  pj)  P{0  <  r  <  Cj  |  p  ■  p^}  +  P{p  ■  p^  +  .01)  P(Cl  +  1  <  r  <  Cj  | 

P  *  Pj.  +  .01}  +  +  P{p  ■  p2 }  f(c1  +  1  <  r  i  c21p  •  Pj)  +  P(p  -  p2  +  .01} 

P{c2  +  l<r<c3|p»p2  +  .01}  +  .. .  +  P{p  -  pj}  P{c2  +  l<r<c3|p  -  p3)  + 

...  +  P{p  ■  pk-1  +  .01}  +l<r<n|p-  +  .01}  +  ...  +  P{p  -  .99} 

P(ck_1  +l<r«n|p-  .99}  (1) 

Because  the  lot  which  was  submittsd  to  ths  plan  was  selected  at  random 
from  the  100  lots  available  in  the  stockpile,  we  know  that 

P(p  •  0}  ■  P{p  ■  .01)  ■  ...  ■  P{p  ■  .99)  -  Yoo  (2) 

The  probability  expressed  in  ths  second  bracket  of  each  product  can  be 
written  as  the  sum  of  s  binomial  probability  function  of  the  form 

X  <?)  <P)'  <3> 

*  B.  V.  Gnedenko,  "The  Theory  of  Probability",  page  64. 
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Using  expressions  (2)  end  (3),  we  csn  rewrite  (1)  in  the  following  forts. 


C1  n  .  C1  n 

p"  Too  1  (r>  (0)r  (1)n"r+Too  1  (t)  (‘01>r  (-")n"r  +  •••  + 

r-0  r-0 


.In  «  2  n 

E  (r)  (Pj)r  (l*P,)n”r  +  Tqo  E  (r)  (p  +.01)r  (1-p  -,01)n~r  +  ... 
r-0  r-Cj+1 


2  n  3  n 

tJo  E  (r)  <p,)r  <l-p2)n'r  +  ~  E  (r)  (p2+.01)r  (1-P  -.01)n_r + 

iUU  r-Cl+l  1  1  iU0  r-c2+l  1  z 


100 


E3  (r)  (P,)r  (1-P,)n"r  +  ...  +-&S  E 
c2 


3'  '  •••  '  100  J-_  <r>  (Pk.1+-01)r  (1-P^f 


r-c.,+1  J  J  "  r"ck-l+1 


...  +  TSq  E  (r)  (.99)*  <.01)n~r 

r“ck-l+1 


In  order  to  maximise  the  probability  of  putting  a  lot  into  the  correct 
category,  we  must  maximise  (4)  with  respect  to  c^,  c2>  c^,  ...  However 

obviously,  wa  do  not  want  to  limit  ourselves  to  the  case  where  N  (number  of 
lots  in  the  stockpile)  ■  100,  but  rather  want  to  generalize  our  approach  so 
that  N  can  go  to  Infinity  and  thus  consider  the  case  where  p  can  assume  any 
value  on  tha  closed  Interval  [0,1]  with  equal  probability. 

First,  wa  must  look  briefly  at  the  definition  of  a  definite  integral. 
Consider  a  function  f(p)  which  is  continuous  on  the  Interval  [a,b],  (a  <  b) , 
except  at  one  or  more  points  of  the  form  p  -  a  +  t/N,  where  t  -  1,2,  ...  N(b- 
and  is  everywhere  non-negative  on  this  interval.  The  graph  of  this  function 
(using  three  grades  as  an  example)  can  be  represented  by  the  following  sketch 


+ 


. . .  + 


.01)  + 


(4) 
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We  now  divide  [a,b]  into  N  equal  intervals,  with  the  length  of  each 
interval  «  Ax^.  In  each  segment  choose  points  e^,  e2  en  ant*  consider  the 
sum  f(e^)  Axj  +  f(e2)  Ax2  +  ...  +  f(®n)  Ax^  (5) 

which  is  equal  to  £  f(e. )  Ax. 

K  i-1  *  1 

I 

Since  all  the  intervals  are  equal,  (5)  can  be  written  as 


II 

.  b-a 

*xi  n 

n 

i-1 

f  (ei) 

and  by  definition 

(6) 

11m 

n 

lim 

h_.  n 

n-** 

r  f(e.) 

Ax. 

-  n-*» 

V  E  f(ei>  "  > 

'  f(p)  dp 

Ax^-k> 

i-1  1 

J> 

Axi-*'0 

n  i-1  1  / 

This  la  exactly  the  form  of  (4) ,  the  sum  of  which  we  are  seeking  to 
maximize  with  respect  to  c^  c2,  c3,  ...  c^,  if  we  let  n-n».  Thus,  if 
we  maximize 


a 


(r)  pr(l-p)n“rdp 
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espcct  co  c^,  Cji  c^i  •••  ,  we  nave  masamizea  the  probability  or 

g  a  lot  In  the  correct  grade  given  that  it  was  selected  at  random  from 
lation  of  lots  whose  fraction  defective  has  a  uniform  distribution  on 
it  interval.  Thus,  Che  use  which  we  have  made  of  integration  is 
lent  to  placing  a  uniform  prior  distrubtion  on  p,  the  true  lot  fraction 
ive. 

Our  problem  now  becomes  one  of  choosing  those  values  of  c^,  ,  ... 

which  maximize  (7)  for  given  values  of  p^,  ,  p^  p^_^  and  n. 

As  an  example  for  the  case  k-3  (3  grades)  we  can  illustrate  graphically 
by  "operating-characteristic"  curves  the  area  which  we  wish  to  be  a  maximum. 


,  n  Probability  of  assigning 

i'u  n _ i-  * 


robability 
f  assigning 
orrect  grade 


bability  of 
igning 
de  C 


area  we  wish 
maximize  is 
icated  by  the 
ded  area 


percent  defective 


le  now  express  s  in  (7)  as 


,  1  In.  .  t.  t.  /•* 

'  l  (i)  P1  (l-p)n-1dp  +  /  Z  (i)  p  (1-p)  dp  -  J  Z 

i»0  0  i«c,+l  0  i“ 


:2  2  n 


0  i«Cj+l 


0  i-Cj+1 


.  ^3  C3  n 

L  <l-p)R"1dp  +  /  Z  (i 
0  i-c2+l 


2  3  n 


(i)  p1  (l-pj^dp  -  J  X  (i)  P1  (l-p)"”1  dp 


0  i-c2+l 


68 


« 


/  n  n  pk-l  r. 

/  l  (i)  px  (l-p)n_:Ldp  -  J  l 
0  i*ck_1+l  0  i-ck_l+l 


(i)  p1  (l-p)n~^dp 


(8) 


1  n 


n  P1 


Let  Q  -  /  (i)  p1  (l-p)R  *dp  -  (i)  /  p1  (l-p)n”idp 


Integrate  by  parts :  u  ■  p 


dv  -  (l-p)n-idp 


.  i-1, 

du  ■  i  p  dp 


„  -.0;-2± 


»-i+3 


(r-i+1) 


n  i  n  .n-i+1 

« .  (1)  - 


i-1  ,,  .n-i+1 


r  i  P:  :.u-jr_Lidp} 

/  n-i+1  op 1 


(n-i+1) 


(i)  < 


n-i+1  K1 


i  .n-i+1  .  f  1  i“l  M n-i+1.  , 

P,  (1-p^  +  J  ^Ti+T  P  P>  dp) 


Integrate  by  parts:  i  ■  p 


i-1 


dv  -  (1-p) 


n-i+1 


du  ■  (i-1)  p*  ^dp 


iksl 


n-i+2 


(n-i+2) 


Now 


A  .  -1  i  .n-i+1  i  ,  -1  1-1  ,,  .n-i+2,  . 

Q  -  <*>  {^^+T  (1_pl)  +  i^i+T  n-I+2  p  (1'p>  ]0  + 

1  i-1  ni-2  .n-i+2  . 

S3+1  /  ^^i+2  p  (1  p)  dp} 


which  equals,  after  simplification 
,  n+1  . ,  n+l 

&  <  * )  P!1  d-pi)n”  -  w  P!1'1  ^-pi> 


n-i+2 


n+l 


(i-1)  /  P1"2  (l-F)n"1+2dp 
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which  c*n  be  written  as 


,  i  n+1  .  , i  .  .  ,  n+1  ?1  .  »  . 

Q  "  "  tt+X  1  (  -1  )  plJ  <1“P1)n  1-3  +  S  <1“1>  /  P  2  (1-P)a~  2dp 

UA  j-i-1  1  1  n+1  0 


Again  Integrate 
by  parts: 


u  -  p 


dv  ■  (l-p)n"’1+2dp 


du  ■  (i-2)  p1_3dp  v  «  - 


.n-i+3 


(n-i+3) 


1  *  n"*"^  1 

0  ■  -  ST  jf,.,  <  i  >  '1 

1-1  1-2  1- 
<M>  /  t«r  ' 


n«-J  .  jtl  (;_1,  l-^j  p1'2  (l-p)»-1+3]o  * 


dsh  u-1)  /l  P1'3  <2-p>n^l‘ 


~T  P  <1-P)  dp 


which  equals,  after  some  aiaplification 


q  -  -  &  <"j1)  >iJ  <l-“i)n+l’J  -  sk  “i1"2  <1-Pi>n+3"1  + 


(1-2)  /  p1"3  <l-p)n+3’1dp 
and,  combining  terms 


*  ‘  '  &  *imi  ^1J  <1-Pl)n+1"3  +  M  <""2>  /  P1"3  d-P)n+3"ldP 

Continuing  to  evaluate  the  integral  by  integrating  by  parts,  w 
cone  to  this  tana 


.n+1  IV1 


« ■  -  sir  £  <1‘>  pi1  «-p/+1-]  *  sir  <"“>  /  p°  «-p>% 

■-  slrrtl>-  ^  [l 

-  sir  p>3  <‘-pi*"+‘-J  -  *  <n">  a-*/*1  »l°  *  A  ("S1) 
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SI  ^  ^  P1J  +  n+l 


p+I  “  -  j0  (I')  PiJ  U-P!)”1'1) 


Thus,  finally  w«  have 


/‘  (5)  p‘  -  Jj  "f,  (T>  a-Pl)n+1") 

o  041  j-i+1  1  1 


and  w«  have  convenient ly  gone  from  the  integral  of  a  binomial  to  the  sum 
of  another  binomial  expansion. 

Now,  making  use  of  this  expression  in  the  original  "s"  equation, 

wv  have 


All  terms  in  the  summation  vanish  (■  0)  except  for  the  last  term  when 
j  *  n+1,  where  we  have  (l)n+1  (0)°  - 

“ *  /  Jo  <J>  <i-P>n'hP  -  $ 


1 

n+1 


b 

E 

i«a 


n+1 

Z 

J-i+1 


("J1) 


«-p2) 


n+l-J 
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And .using  these  expressions  In  our  original  expression  (8),  we 


have 


l  n+1  n+i  i  _  j  i  _  j  <  c2  n+1  n+1  1 

•  -  i  1  1  <  i >  PiJ  u-pj.)  J  +  ^ir  1  ,  1  < J  >  p2J 

n+1  i-o  j-l+1  1  1  n+1  j-i+1 


i  c2  n+1  n+1  i 

(1-P,) *  "TIT  *  =  <  *  >  PlJ  (1-Pl) 


n+1 


i*  c  ^+1  J-i+1 


n+l-j  +  _L_  j 

n+1  l-c2+l 


n+1  n+1  t 

(  J  )  P,J  (1“P3) 

J-l+1 


i  !  3  n+1  n+1  j  .n+l-j  , 

n+1  J  -  -A-  £  £  (  j  )  P2J  + 

n+1  i-c2+l  J-l+1 


n-ck- 1 
n+1 


n 

£ 


n+1  .n+1.  J  _  \ 

1  ,  (  J  >  pk-l  tl”pk-l) 


n+l-J 


c2  n+1  n+1  i 

r  r  (  I  j  p^J 


"  1  l-c^+1  J“i+1 

L  1  °?  ("j1)  ,1  U-p/*1’1  *  ?  "?  <1 

+1  i-0  J-l+i  1  1  i-Cj+1  j-l+1 

■J  -  ?  "r1  (I1)  PlJ  U-p/*1^  +  =3 

1—  c  X+1  J-l+1  l-ft2+l  J- 


<l-p2) 


n+1 

£ 

1+1 


(i1)  ,i  d-p,)"*1^  -  i3  i1  d1)  p2]  (i-pp”*1-3  + ...  + 

J  J  l-c,+l  j-l+1 


n  n+1  n+1  4  .  .n+1-1, 

n-c.  .  -  £  £  <  J  )  Pv  i  U-Pk-l5 

i-c^j+1  J-i+1 
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Cj+1  n+1  .  c,+l  0+1 

n+r  I  £  £  <  *  >  '1  (1-pi)U  _J  +  lm*i+2  £  >  p2 

(1-P  )n+I“j  -  1  nE  (njl)  p  j  (l-p  )n+1-J  +  T1  n+E  U1)  p  J 

*  J..  *  *  » _ .  m  >  i  J 


i-Cj+2  j-l 


c„+l  n+1 


i-c,+2  j-l 


(l-p3)n+1"J  -  3  I  ("j1)  pi  (l-p  )n+1'J  +  . . .  +  n-c.  , 

l-Cj+2  j-i  z 

n+1  n+1  n+1  -i  n+i-i 

-  S  I.  (  j  )  P.  (l-pk_,)n+1  J] 

i-e^j+2  J-i  *  1  K  1 

which  can  ba  wrletan  aa 

cx+l  c2+l  c2  +1 

1/n+l  [  E  p(x  2  ij  n+1,  p,)  +  E  p(x  2  i{  n+1,  p„)  -  E 
i-1  A  i-Cj+2  2  i-c^+2 


3  3  * 

P<X  >  ij  n+1,  p.)  +  I  p(x  >  i;  n+1,  p  )  -  I  p(x  2  i{  n+1,  p,.)  + 
1-C-+2  J  i-c,+2  1 


...  +  n-c.  .  -  E  p(x  >  i{  n+1,  p.  ,)] 
i-c^+2 

whara  p(x  2  it  n+1,  pfc)  ia  dafinad  to  ba  tha  probability  that  tha  random  variabla  x 
ia  graatar  than  or  aqual  i  if  lt  haa  tha  binomial  dlatribution  with  paramatara 
n+1  and  pt, 

Wa  now  make  uaa  of  any  convanient  table  of  cumulative  binomial  probabilitlaa 
for  aavaral  different  valuaa  of  n.  Thus,  for  a  fixed  sample  site  and  given  quality 
lavala,  p^,  p2,  P3,  ...  Pk.^  wa  can,  by  uaa  of  a  high-apead  electronic  computer, 
compute  valuea  of  a  for  every  c^,  c^,  ...  c^_1  combination  and  chooaa  that 
combination  which  glvaa  tha  maximum  valua  of  a.  . 
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GENERAL  APPLIC.VJIOMS  AND  EXAMPLES. 

For  the  Cyt  K  ■  2 

The  clonal fleet ion  of  a  lot  into  two  grades  is,  for  most  situations, 

•qul valent  to  either  accepting  or  rejecting  the  lot.  For  example,  a 
quality  control  analyat  might  be  willing  to  accept  as  satisfactory  a  101 
defect  rate  for  flaah  bulbs.  Thus,  if  he  took  a  sample  of  20  bulbs  from 
one  hour's  production,  the  appropriate  tablu  indicates  he  would  allow  one 
defective  sample  before  rejecting  that  hour's  output.  > 

For  the  Case  K  ■  3 

Hare,  the  purpose  might  be  to  place  e  given  lot  of  artillery  fuzes 
which  have  been  in  storage  for  some  time  into  ona  of  three  grades,  Grade 
A,  indicating  those  lots  acceptable  for  unrestricted  uae |  Grade  B,  those 
lots  generally  acceptable  with  certain  restrictions ;  and  Grade  C,  those 
lots  unacceptable  for  future  use. 

Given  a  sample  else  of  45  and  prescribed  quality  levels  of  15X  end  30X, 
the  appropriate  table  Indicates  we  would  allow  6  defects  for  a  Grade  A  lot  and 
up  to  13  defects  for  a  Grade  B  lot. 

lEntivi  SmJLz  A 

An  example  hare  might  be  the  case  where  an  electronics  dealer  would  be 
willing  to  pay  x  dollars  for  a  lot  of  batteries  which  are  of  Grade  A  quality, 
y  dollars  (y  <  x)  for  a  lot  of  Grade  B  quality,  a  dollars  (a  <  y  <  x)  for  a 
lot  of  Grade  C  quality  and  reject  as  unacceptable,  lots  of  Grade  D  quality. 

If  for  a  sample  of  alia  200,  the  respective  quality  levels  are  IX 
(Grade  A),  10X  (Grade  B),  and  25X  (Grade  C),  the  appropriate  table  calls  for 
acceptance  numbers  2,  20,  and  50. 

FURTHER  RECOMMENDATIONS .  The  use  of  the  uniform  prior  distribution  is 
a  fairly  conservative  approach  but  would  seem  to  have  realistic  applications 
for  newly  manufactured  items  or  items  for  which  little  is  known  of  the 
functioning  characteristics. 
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It  would  be  interesting  to  consider  some  other  prior  distributions. 
a  simple  one,  which  seems  both  reasonable  and  easy  to  handle  mathematically 
would  be  to  assume  p  is  uniformly  distributed  on  the  interval  [0,  0.50},  l.e. 
assume  that  no  lot  is  more  than  501  defective  and  guard  against  mlsgrading 
any  lot  with  fraction  defective  between  0  and  501  with  equal  protection. 

Another  interesting  distribution  to  consider  would  be 

f  (p)  -  2  (1-p)  0  s  p  <  1 

■  0  otherwise 

This  distribution  essumss  lots  with  p  almost  zaro  are  most  likely  in 
the  stockpile,  lote  with  p  almost  equal  one  are  quite  rare  end  the 
probability  that  a  <  p  <  b  increases  linearly  as  a  and  b  increase. 
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APPENDIX 


TABLES 'OP  ACCEPTANCE  NUMBERS 


I: 

i 

r 
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THREE  GRADES  N-lfi 


PI 


P2 

.01 

.02 

.03 

_i9iL 

.05 

.06 

.0? 

.08 

.10 

.11 

.12 

.1? 

.14 

•  15 

.02 

0,1 

.03 

0,1 

0,1 

.04 

0,1 

0,1 

0,1 

.05 

0,1 

0,1 

0,1 

0,1 

.06 

0,1 

0,1 

0,1 

0,1 

0,1 

.07 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

.08 

0,1 

0,1 

0,1 

0.1 

0,1 

0,1 

0,1 

.09 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

.10 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

.11 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

.12 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

.13 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0.1 

0,1 

0.1 

.lb 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

.15 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0,1 

0.1 

0,1 

.16 
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0,1 
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0,1 
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0,1 

0,1 

0,1 

0,1 
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0,1 

0,1 

.20 

0,1 

0,1 
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0,1 

0,1 
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0,1 
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0,1 
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.23 
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0.1 

0.1 
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0.1 
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0.1 
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0,2 

0,2 

0,2 
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0,2 
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0,3 
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0,2 
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•34 

0,3 
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0,3 

0,3 
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0.3 
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1.3 

■  36 

0.3 

0,3 
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0,3 
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•  37 
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.39 

0,3 

0,3 

0,3 

0,3 

0,3 

0,3 

0,3 

0,3 

0,3 

0,3 

0,3 

0,3 

0,3 

0,3 

1.3 

.1*0 

0,3 

0,3 

0,3 

0,3 

0,3 

0,3 

0,3 

0,3 

0,3 

0,3 

0,3 

0,3 

0.3 

0,3 

1,3 

.1*1 

0,3 

0,3 

0,3 

0.3 

0,3 

0,3 

0,3 

0,3 

0,3 

0,3 

0,3 

0,3 

0,3 

0,3 

1,3 

.1*2 

0,1* 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,3 

0,4 

1,4 

.1*3 

0,1* 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,3 

0,4 

1.4 

.1*1* 

0,1* 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,3 

0,4 

1,4 

.**5 

0,1* 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,3 

0,4 

1,4 

.1*6 

0,1* 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,3 

0,4 

1,4 

.1*7 

0,1* 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,3 

0,4 

1.4 

.1*8 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,3 

0,4 

1.4 

.1*9 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,3 

0,4 

1,4 

.50 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,4 

0,3 

0,4 

1,4 
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THHti  SPADES  11*10 
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.01* 
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.19 
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0,1  1,2  1,2 
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1,2  1,2  1,2 
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1,2  1,2  1,2 

.21* 
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.25 
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.26 

1,2  1,2  1,2 
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1,2  1,2  1,2 
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1,2  1,2  1,2 
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1,2  1,2  1,2 

.31* 

1,3  1,2  1,2 

.35 

1,3  1,3  1,3 

.36 

1,3  1,3  1,3 

.37 

1,3  1,3  1,3 
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1,3  1,3  1,3 
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7.12  8,11  9,11 
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2,6 

3,6 

3,6 

4,5 

4,5 

5,6 

5,6  5,6 

.15 

0,7 

0,7 

0,7 

1,7 

1.7 

2.6 

3,6 

3,6 

*,6 

5,6 

5,6 

5,6  6,7 

6.7 

.16 

0,7 

0,7 

0,7 

1,7 

1,7 

2,7 

2,7 

3,7 

4,7 

4,7 

5,6 

6,7  6,7 

6,7 

6,7 

.17 

0,8 

0,8 

0,8 

1,8 

1,8 

2.8 

2,8 

3,7 

4,7 

4,7 

5,7 

6,7  6,7 

6,7 

7,8 

.18 

0,8 

0,8 

0,8 

1,8 

1,8 

2,8 

2,8 

3.8 

4,8 

4.8 

5,8 

5,8  6,7 

7,8 

7,8 

.19 

0,9 

0,9 

0,9 

1,9 

1,9 

2,9 

2,9 

3,9 

4,9 

4,8 

5.8 

5,8  6,8 

7,8 

7,8 

.20 

0,9 

0,9 

0,9 

1,9 

1,9 

2,9 

2,9 

3.9 

3,9 

4,9 

5,9 

5,9  6,9 

7,8 

7,8 

.21 

0,10 

0,10 

0,10 

1,10 

1,10 

2,10 

2,10 

3,10 

3,10 

4,10 

5,10 

5,9  6,9 

6,9 

7,9 

.22 

0,10 

0,10 

0,10 

1,10 

1,10 

2,10 

2,10 

3,10 

3,10 

4,10 

5,10 

5,10  6,10 

6,10 

7,10 

.23 

0,11 

0,11 

0,11 

1,11 

1,11 

2,11 

2,11 

3,11 

3,11 

4,11 

5.11 

5,11  6,11 

6,10 

7,10 

.24 

0,11 

0,11 

0,11 

1,11 

1.11 

2,11 

2,11 

3,11 

3,11 

4,11 

5,11 

5,11  6,11 

6,11 

7,11 

.25 

0,12 

0,12 

0,12 

1,12 

1,12 

2,12 

2,12 

3,12 

3,12 

4,12 

4,12 

5,12  6,12 

6,12 

7,12 

.26 

0,12 

0,12 

0,12 

1,12 

1,12 

2,12 

2,12 

3,12 

3,12 

4,12 

4,12 

5,12  6,12 

6,12 

7,12 

,27 

0,13 

0,13 

0,13 

1.13 

1,13 

2,13 

2,13 

3,13 

3,13 

4,13 

4,13 

5,13  6,13  6,13 

7,13 

.28 

0,13 

0,13 

0,13 

1,13 

1,13 

2,13 

2,13 

3,13 

3,13 

4,13 

4,13 

5,13  6,13 

6,13 

7,13 

.29 

0,14 

0,14 

0,l4 

1,14 

1,14 

2,14 

2,14 

3,14 

3,14 

4,14 

4,14 

5,14  6,l4 

6,14 

7,14 

.30 

0,14 

0,14 

0,l4 

1,14 

1,14 

2,14 

2,14 

3,14 

3,14 

4,14 

4,l4 

5,14  6,14 

6,14 

7,14 

.31 

0,15 

0,15 

0,15 

1,15 

1,15 

2.15 

2,15 

3,15 

3,15 

4,15 

4,15 

5,15  6,15 

6,15 

7,15 

.32 

0,15 

0,15 

0,15 

1,15 

1,15 

2,15 

2,15 

3.1? 

3,15 

4,15 

4,51 

5,15  6,15 

6,15 

7,15 

.33 

0,l6 

0,l6 

0,16 

1,16 

l,l6 

2,16 

2,16 

3,l6 

3,16 

4,16 

4,16 

5,16  6,16 

6,16 

7,16 

.34 

0,16 

0,16 

0,l6 

1,16 

1,16 

2,16 

2,16 

3,16 

3,16 

4,l6 

4,16 

5,16  6,16 

6,1 6 

7,16 

.35 

0,17 

0,17 

0,17 

1,17 

1,17 

2,17 

2,17 

3,17 

3,17 

4,17 

4,17 

5,17  6,17 

6,17 

7,17 

.36 

0,17 

0,17 

0,17 

1,17 

1.17 

2,17 

2,17 

3,17 

3,17 

4,17 

4,17 

5,17  6,17 

6,17 

7,17 

.37 

0,18 

0,18 

0,18 

1,18 

1,18 

2,18 

2,18 

3,18 

3,18 

4,18 

4,18 

5,18  6,18 

6,18 

7,18 

.38 

0,18 

0,18 

0,18 

1,18 

1,18 

2,16 

2,18 

3,18 

3,18 

4,l8 

4,18 

5,18  6,18 

6,18 

7,18 

.39 

0,19 

0,19 

0,19 

1,19 

1.19 

2,19 

2,19 

3,19 

3,19 

4,19 

4,19 

5,19  6,19 

6,19 

7,19 

.40 

0,19 

0,19 

0,19 

1,19 

1,19 

2,19 

2,19 

3,19 

3,19 

4,19 

4,19 

5,19  6,19 

6,19 

7,19 

.41 

0,20 

0,20 

0,20 

1,20 

1,20 

2,20 

2,20 

3,20 

3,20 

4,20 

4,20 

5,20  6,20 

6,20 

7,20 

.42 

0,20 

0,20 

0,20 

1,20 

1,20 

2,20 

2,20 

3,20 

3,20 

4,20 

4,20 

5,20  6,20 

6,20 

7,20 

.43 

0,21 

0,21 

0,21 

1.21 

1,21 

2,21 

2,21 

3,21 

3,21 

4,21 

4,21 

5,21  6,21 

6,21 

7,21 

.44 

0,21 

0,21 

0,21 

1.21 

1,21 

2,21 

2,21 

3,21 

3,21 

4,21 

4,21 

5,21  6,21 

6,21 

7,21 

.45 

0,22 

0,22 

0,22 

1,22 

1,22 

2,22 

2,22 

3,22 

3,22 

4,22 

4,22 

5,22  6,22 

6,22 

7,22 

.46 

0,22 

0,22 

0,22 

1,22 

1,22 

2,22 

2,22 

3.22 

3,22 

4,22 

4,22 

5,22  6,22 

6,22 

7,22 

.47 

0,23 

0,23 

0,23 

1,23 

1,23 

2,23 

2,23 

3,23 

3,23 

4,23 

4,23 

5,23  6,23 

6,23 

7,23 

.48 

0,23 

0,23 

0,23 

1,23 

1,23 

2,23 

2,23 

3,23 

3,23 

4,23 

4,23 

5,23  6,23 

6,23 

7,23 

.49 

0,24 

0,24 

0,24 

1,24 

1,24 

2,24 

2,24 

3,24 

3,24 

4,24 

4,24 

5,24  6,24 

6,24 

T,24 

.50 

0,24 

0,24 

0,24 

1,24 

1,24 

2,24 

2,24 

3,24 

3,24 

4,24 

4,24 

5,24  6,24 

6,24 

7,24 

THREE  GRADES  N»50 
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FOUR  GRADES  N*50 


ACCEPT. 

PI  P2  P3  NOS. 

Pi  P2  P3 

.01  .05  .10  0,271+ 

.05  .10  .15 

•15  0,1,7 

.20 

.20  0,1,9 

.25 

.25  0,1,12 

•  30 

.30  0,1,11+ 

.35 

•35  0,1,17 

.40 

.40  0,1,19 

.45 

.1+5  0,1,22 

•  50 

•50  0,1,24 

.15  .20 

.10  .15  0,5,6 

.25 

.20  0  ,l+,9 

.30 

.25  0 ,1+  ,12 

.35 

•  30  0,l+,ll+ 

...  .40 

.35  o.i+.rr 

.45 

•  1+0  0,4,19 

.50 

.45  0,4,22 

.50  0,4,24 

.20  .25 

•  30 

.15  .20  0,7,8 

•  35 

•25  0,7,12 

.40 

•30-  0,7,14 j 

.45 

•35  0,7,17 

•  50 

.40  0,7,19 

•  1+5  0,7,22 

.25  .30 

.50  0,7,24 

.35 

.40 

•20  .25  0,10,11 

.45 

•30  0,9,14 

.50 

•35  0,9,17 

•  J*o,  0,9,19 

•  30  .35 

.45  0,9,22 

.40 

•50  0,9,24 

.45 

•  50 

.25  .30  0,12,13 

•35  0,12,17 

.40  0,12,19 

.45  0,12,22 

.10  .15  .20 

•50  0,12,24 

•  25 

.30 

•30  .35  0,15,16 

•  35 

.40  0,15,19 

.40 

•45  0,14,22 

.45 

•50  0,14,24 

•  50 

ACCEPT. 

NOS. 

PI 

P2 

P3 

ACCEPT. 

NOS. 

.10 

.20 

.25 

4,10,11 

2,4,9 

•  30 

4,9,14  i 

2,4,12 

.35 

**,9,17 

2,4,14 

.40 

4,9,19 

2,4,17 

.45 

4,9,22 

2,4,19 

•  50 

4,9,24 

2,4,22 

2,4,24 

.25 

.30 

4,12,13 

•  35 

4,12,17 

1,7,8 

.40 

4,12,19 

1,7,12 

.45 

4,12,22 

1.7.14 

•  50 

4,12,24  • 

1,7,17 

1,7,19 

.30 

.35 

4,15,16 

1,7,22 

.40 

4,15,19 

1,7,24 

.45 

4,14,22 

.50 

4 ,14, ^4 

1,10,11 

1,9,14 

•  15 

.20 

.25 

7,9,10 

1,9,1? 

.30 

7,8,14 

1,9,19 

•  35 

7,8,17 

1,9,22 

.40 

7,8,19 

'  1,9,24 

.45 

7,8,22 

* 

•  50 

7,8,24 

1,12,13 

• 

. 

1,12,17 

.25 

.30 

7,12,13 

1,12,19 

•  35 

T  ,12 ,17 

1,12,22 

.40 

7,12,19 

1,12,24 

.45 

7,12,22 

.50 

7,12,24 

1,15,16 

1,15,19 

.30 

•  35 

7,15,16 

1,14,22 

.40 

7,15,19 

1,14,24 

.45 

7,14,22 

.50 

7,14,24 

.20 

.25 

.30  10,12,13 

5,7,8 

.35  10,11,17 

5  ,0 ,12 

•40  10,11,19 

5,6,14 

•45  10,11,22 

5,6,17 

.50  10.11.24 

5,6,19 

5,6,22 

•  30 

•35 

9 ,15  ,16 

5,6,24 

.4o 

9,14,19 

.45 

9 ,14 ,22 

•  50 

9,14,24 
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FOUR  GRADES  N-60 


ACCEPT. 

PI  P2  P3  NOS. 

ACCEPT. 

PI  P2  P3  NOS. 

ACCEPT. 

PI  P2  P3  NOS. 

.01  .05  .10  0,2,5 

Tos  TlO  HI  2,5,6 

.10  .20  .25  5,12,13 

.15  0,2,8 

.20  2, 5, 11 

.30  5,11,17 

.20  0,2,11 

.25  2,5,14 

.35  5,11,20 

.25  0 ,2  ,14 

.30  2,5,17 

.40  5,11,23 

.30  0,2,17 

.35  2,5,20 

.45  5,11,26 

.35  0,2,20 

.40  2,5,23 

.50  5,11,29 

.40  0,2,23 

.45  2,5,26 

.1*5  0,2,26 

•50  2,5,29 

.25  .30  5,15,16 

•50  0,2,29 

.35  5,14,20 

.15  .20  2,9,10 

.40  5,14,23 

.10  .15  0,5,8 

.25  2,8,14 

.45  5,14,26 

.20  0,5,11 

.30  2,8,17 

.50  5,14,29 

.25  0,5,11* 

.35  2,8,20 

.30  0,5,17 

.40  2,8,23 

.30  .35  5,18,19 

.35  0,5,20 

.45  2,8,26 

.40  5,18,23 

.1*0  0,5,23 

.50  2,8,29 

.45  5,17,26 

.1*5  0,5,26 

.50  5,17,29 

•50  0,5,29 

.20  .25  2,12,13 

.30  2,11,17 

.15  .20  .25  9,11,13 

.15  .20  0,9,10 

.35  2,11,20 

.30  9,10,17 

.25  0,8,lU 

.40  2,11,23 

.35  9,10,20 

.30  0,8,17 

.45  2,11,26 

.40  9,10,23 

.35  0,8,20 

.50  2,11,29 

.45  9,10,26 

.1*0  0,8,23 

.50  9,10,29 

,45  0,8,26 

.25  .30  2,15,16 

.50  0,8,29 

.35  2,14,20 

.25  .30  8,15,16 

.40  2,14,23 

.35  8,14,20 

.20  .25  0,12,13 

.45  2,14,26 

.40  8,14,23 

.30  0,11,17 

.50  2,14,29 

.45  8,14,26 

.35  0,11,20 

.50  8,14,29 

.40  0,11,23 

.30  .35  2,18,19 

.45  0,11,26 

.40  2,18,23 

.30  .35  8,18,19 

.50  0,11,29 

.45  2,17,26 

.40  8,17,23 

.50  2,17,29 

.45  8,17,26 

.25  .30  0,15,16 

.50  8,17,29 

.35  0,14,20 

.40  0,14,23 

.20  .25  .30  12,14,16 

.45  0,14,26 

.10  .15  .20  5,8,10 

.35  12,13,20 

.50  0,14,29 

.25  5,8,14 

.40  12,13,23 

.30  5,3,17 

.45  12,13,26 

.30  .35  0,18,19 

.35  5,8,20 

.50  12,13,29 

.40  0,18,23 

.40  5,6,23  | 

.45  0,17,26 

.45  5,8,26 

.30  .35  11,18,19 

.50  0,17,29 

•50  5,8,29 

.40  11,17,23 

.45  11,17,26 

.50  11,17,29 
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FOUR  GRADES  N=70 

ACCEPT.  1 

ACCEPT.  I 

ACCEPT. 

.01  .0? 

.10 

0,3,6 

.15 

0,2,10 

.20 

0,2,13 

.25 

0,2,17 

.30 

0,2,20 

.35 

0,2,24 

.40 

0,2,27 

.45 

0,2,31 

•  50 

0,2,34 

.10 

.15 

0,6,9 

.20 

0,6,17 

.30 

0,6,20 

•  35 

0,6,24 

.40 

0,6,27 

.45 

0,6,31 

‘ 

.50 

0,6,34 

.15 

.20 

0,10,12 

.25 

0,10,17 

.30 

0,10,20 

.35 

0,10,24 

.40 

0,10,27 

.45 

0,10,31 

.50 

0,10,34 

.20 

.25 

0,,l4,l6 

.  30 

0  20 

V  )  A.  J  1  fc  V 

.35 

0,13,24 

.40 

0,13,27 

.45 

0,13,31 

.50 

0,13,34 

.25 

.30 

0,18,19 

.35 

0,17.24 

.40 

0,17,27 

•  45 

0,17,31 

.50 

0,17,34 

.30 

.35 

0,22,23 

.40 

0,20,27 

.45 

0,20,31 

1 

.50 

0,20,34 

.15 

.20 

2,10,12 

.25 

2,10,17 

.35 

2,10,24 

.40 

2,10,27 

.45 

2,10,31 

.50 

2,10,34 

.20 

.25 

2,14,16 

.30 

2,13,20 

.35 

2,13,24 

.40 

2,13,27 

.45 

2,13,31 

.50 

2,13,34 

.25 

.30 

2,18,19 

.35 

2,17,24 

.40 

2,17,27 

.45 

2,17,31 

.50 

2,17,34 

•  30 

.35 

2,22,23 

.40 

2,20,27 

.45 

2,20,31 

.50 

2,20,34 

.15 

.20 

6,10,12 

.25 

6,9,17 

.30 

6,9,20 

.35 

6,9,24 

.40 

6,9,27 

.45 

6,9,31 

.50 

6,9,34 

PI 

P2 

P3 

NOS. 

10 

.20 

.25 
.30 
•  35 
.40 

6  .lV.iS" 
6,13,20 
6,13,24 
6,13,27 

.45 

6,13,31 

.50 

6,13,34 

.25 

.30 

6,18,19 

.35 

6,17,24 

.40 

6,17,27 

.45 

6,17,31 

.30 

.35 

6,22,23 

.40 

6,20,27 

.45 

6,20,31 

.50 

6,20,34 

15 

.20 

.25 

10,13,16 

.30 

10,12,20 

.35 

10,12,24 

.40 

10,12,27 

.45 

10,12,31 

.50 

10,12,34 

.25 

.30 

10,18,19 

.35 

10,17,24 

.40 

10,17,27 

.45 

10,17,31 

.50 

10,17,34 

•  30 

.35 

10,22,23 

.40 

10,20,27 

.45 

10,20,31 

.50 

10,20,34 

20 

.25 

.30 

14,17,19 

•  35 

14,16,24 

.40 

14,16,27 

.45 

14 ,16,31 

.50 

14,16,34 

.30 

.35 

13,21,22 

.40 

13,20,27 

.45 

13,20,31 

•  50 

13,20,34 

FOUR  GRADES  N-fiO 


PI  P2  P3  NOS. 


r  x . 

NOS. 


3,7.11 


PI  P2  P3  NOS. 


.1*5 

0,3,35 

.50 

0,3,39 

.10 

.15 

0,7,11 

.20 

0,7,15 

.25 

0,7,19 

.30 

0,7,23 

.35 

0,7,27 

.Uo 

0,7,31 

.1*5 

0,7,35 

.50 

0,7,39 

.15 

.20 

0,12,11* 

.25 

0,11,19 

.30 

0,11,23 

.35 

0,11,27 

.1*0 

0,11,31 

.1*5 

0,11,35 

.50 

0,11,39 

.20 

.25 

0,16,18 

.30 

0,15,23 

.35 

0,15,27 

.1*0 

0,15,31 

.1*5 

0,15,35 

•  50 

0,15,39 

.25 

.30 

0,20,22 

.35 

0,19,27 

.1*0 

0,19,31 

.1*5 

0,19,35 

.50 

0,1939 

.30 

.35 

0,24,26 

.1*0 

0,23,31 

.1*5 

0,23,35 

.50 

0,23,39 

3,7,35 


30 

7,15,23 

35 

7,15,27 

1*0 

7,15,31 

**5 

7,15,35 

50 

7,15,39 

30 

7,20,22 

35 

7,19,27 

1*0 

7,19,31 

1*5 

7,19,35 

50 

7,19,39 

35  7,24,26 

40 

7,23,31 

1*5 

7,23,35 

50 

7.23.39 

.15  .20 

.25 

12,15,18 

.30 

12,14,23 

.35 

12,14,27 

.40 

12,14,31 

.45 

12,14,35 

.50 

12,14,39 

.25 

.30 

11,20,22 

.35 

11,19,27 

.40 

11,19,31 

.45 

11,19,35 

.50 

11,19,39 

.30 

.35 

11,24,26 

.40 

11,23,31 

•  >*5 

11,23,35 

•  50 

11,23,39 

.20  .25 

.30 

16,19,22 

.35 

16,18,27 

.40 

16,18,31 

.45 

16,18,35 

.50 

16,18,39 

.30 

.35 

15,24,26 

.40 

15,23,31 

.45 

15,23,35 

.50 

15,23,39 

FOUR  GRADES  N=90 


ACCEPT. 

PI  P2  P3  NOS. 

ACCEPT. 

PI  P2  P3  NOS. 

ACCEPT. 

PI  P2  P3  NOS . 

.01  .05  .10  0,U, 8 

.05  .10  .15  4,8,12 

"715  755  725  ST,i6 ,21 

.15  0,3,13 

.20  4,8,17 

.30  8,17,26 

.20  0,3,17 

.25  4,8,22 

.35  8,17,31 

.25  0,3,22 

.30  4,8,26 

.40  8,17,35 

•30  0,3,26 

.35  4,8,31 

.45  8,17,40 

.35  0,3,31 

.40  4,8,35 

.50  8,17,44 

.1*0  0,3,35 

.45  4,8,40 

.45  0,3,40 

.50  4,8,44 

.25  .30  8,23,25 

.50  0,3,44 

.35  8,22  ,  31 

.15  .20  3,13,16 

.40  8,22,35 

.10  .15  0,8,12 

.25  3,13,22 

.45  8,22,40 

.20  0,8,17 

.30  3,13,26 

.50  8,22,44 

.25  0,8,22 

.35  3,13,31 

.30  0,8,26 

.40  3,13,35 

.30  .35  8,27,29 

.35  0,8,31 

.45  3,13,40 

.40  8,26,35 

.40  0,8,35 

.50  3,13,44 

.45  8,26,40 

.45  0,8,40 

.50  8,26,44 

.50  0,9,44 

.20  .25  3,18,21' 

.30  3,17,26 

.15  .20  .25  13,17,21 

.15  .20  0,13,16 

.35  3,17,31 

.30  13,17,26 

.25  0,13,22 

.40  3,17,35 

.35  13,16,31 

.30  0,13,26 

.45  3,17,40 

.40  13,16,35 

.35  0,13,31 

•50  3,17,44 

.45  13,16,40 

.40  0,13,35 

.50  13,16,44 

.45  0,13,40 

.25  .30  3,23,25 

.50  0,13,44 

.35  3,22,31 

.25  .30  13,23,25 

.40  3,22,35 

.35  13,22,31 

.20  .25  0,18,21 

.45  3,22,40 

.40  13,22,35 

.30  0,17,2 6 

.50  3,22,44 

.45  13,22,40 

.35  0,17,31 

.50  13,22,44 

.40  0,17,35 

.30  .35  3,27,29 

.45  0,17,40 

.40  3,26,35 

.30  .35  13,27,29 

.50  0,17,44 

.45  3,26,40 

.40  13,26,35 

.50  3,26,44 

.45  13,26,40 

.25  .30  0,23,25 

.50  13,26,44 

.35  0,22,31 

.40  0,22,35 

.20  .25  .30  18,23,25 

.45  0,22,40 

.10  .15  .20  8,13,16 

.35  16,22,31 

•50  0,22,44 

.25  8,12,22 

.40  18,22,35 

.30  8,12,2 6 

.45  18,22,40 

.30  .  35  0,27,29 

.35  8,12,31 

.50  3  8,22,44 

.40  0,26,35 

.40  8,12,35 

.45  0,26,40 

.45  8,12,40 

.30  .35  17,27,29 

.50  0,26,44 

.50  8,12,44 

.40  17,26,35 

.45  17,26,40 

.50  17,26,44 
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FOUR  GRADES  R-lOO 


ACCEPT. 


ACCEPT. 


ACCEPT. 


P2  F3 


P2  P3 


.45 

0,4, 1*4 

.50 

0,4,49 

10 

.15 

0,9,14 

.20 

0,9,19 

.25 

0,9,24 

.30 

0,9,29 

.35 

0,9,34 

.1*0 

0,9,39 

.**5 

0,9,44 

.50 

0,9,49 

.15 

.20 

0,15,19 

.25 

0,14,24 

.30 

0,14,29 

.35 

0,14,34 

.1*0 

0,14,39 

.1*5 

0,14,44 

.50 

0,14,49 

.20 

.25 

0,20,23 

.30 

0,19,29 

.35 

0,19,34 

.1*0 

0,19,39 

.1*5 

0,19,44 

.50 

0,19,49 

.25 

.30 

0,25,28 

.35 

0,24,34 

.1*0 

0,24,39 

.1*5 

0,24,44 

.50 

0,24,49 

.30 

.35 

0,29,33 

.1*0 

0,29,39 

.1*5 

0,29,44 

.50 

0,29,49 

25 

9,2o,23 

.30 

9  ,19 ,29 

.35 

9,19,34 

.40 

9,19,39 

.45 

9,19,44 

.45 

9  ,24,44 

.50 

9,24,49 

.30 

.35 

9,29,33 

.40 

9,29,39 

.45 

9,29,44 

.50 

9 ,29 ,49 

.20 

.25 

15,19,23 

.30 

15 ,19 ,29 

.35 

15,19,34 

.40 

15 ,19,39 

.45 

15,19,44 

.50 

15,19,49 

.25 

.30 

15,25 ,28 

.35 

15,24,34 

.40 

15 ,24,39 

.45 

15 ,24,44 

.50 

15 ,24,49 

.30 

•  35 

14,29,33 

.40 

14,29,39 

.45 

14,29,44 

.50 

14,29,49 

.25 

.30 

19,24,28 

.35 

19,23,34 

.40 

19,23,39 

.45 

19,23,44 

.50 

19,23,49 

.30 

.35 

19,29,33 

.40 

19 ,29,39 

.45 

19,29,44 

.50 

19,29,49 

05  .10  0,5,11 

.15  0,5,18 

.20  0,5,2k 

.25  0,5,30 

.30  0,5,36 

•  35  0,5, k3 

•  k0  0,5,k9 

•k5  0,5,55 

.50  0,5,61 

,10  .15  0,11,18 
.20  0,11,0k 
.25  0,11,30 
.30  0,11,36 
.35  0,ll,k3 
.ko  0,ll,k9 
•k5  0,11,55 
.50  0,11,61 

.15  .20  0,18,23 

.25  0,18,30 
.30  0,18,36 
.35  0,l8,k3 
•kO  0,l8,k9 
. k5  0,18,55 

.50  0,l8,6l 

.20  .25  0,25,29 

.30-  0,2k, 36 
.35  0,2k,k3 
.kO  0,2k,k9 
•k5  0,2k, 55 
.50  0 ,2k  ,6l 

.25  .30  0,31,36 

.35  0,31,k3 
,kO  0,31,k9 
.k5  0,31,55 
.50  0,31,61 

.30  .35  0,37,k2 

•kO  0,37,k9 
.k5  0,37,55 
.50  0,37,61 


,05  .10  .15 

.20 


.15  .20 

.25 


.20  .25 
.30 
.35 
.kO 
•  k5 
.50 

.25  .30 

.35 


.30  .35 
.ko 
.  k5 
•  50 


5,12,17 

5,11,2k 

5,11,30 

5.11.36 

5.11, k3 

5.11, k9 

5.11.55 
5,11,61 

5,18,23 

5,18,30 

5.18.36 

5.18,  k3 

5.18, k9 

5.18.55 

5,18,61 

5,25,29 
5, 2k, 36 
5,2k,k3 
5,2k, k9 
5, 2k, 55 
5,2k,6l 

5.31.36 

5.31, k3 

5.31, k9 
5,31,55' 

5.31.61 

5 . 37,  k2 

5.37, k9 

5.37.55 

5.37.61 


12,18,23 

11,18,30 

11,18,36 

11.18, k3 

11.18, k9 
11,18,55 
11,18,61 


S. 


.20  .25  11,25,29 

.30  11, 2k, 36 
.35  11,2k, k3 
.kO  ll,2k,k9 
.k5  11 ,2k,55 
.50  11 ,2k  ,6l 

.25  .30  11,31,36 

.35  11,31, k3 
.kO  11,31  ,k9 
,k5  11,31,55 
.50  11,31.61 

.30  .35  11,37 ,k2 

.kO  ll,37,k9 
•k5  11,37,55 
.50  11,37,61 

.20  .25  18,2k ,29 
.30  18, 2k, 36 
.35  I8,2k,k3 
.ko  I8,2k,k9 
. k5  18, 2k, 55 
.50  18,2k ,6l 

.25  .30  18,31,36 

.35  I8,31,k3 
.ko  I8,31,k9 
.k5  18,31,55 
.50  18,31,61 

.30  .35  I8,37,k2 
.kO  I8,37,k9 
•k5  18,37,55 
.50  18,37,61 

.25  .30  2k, 30, 36 

.35  2k,30,k3 
.kO  2k,30,k9 
•  k5  2k, 30, 55 
.50  2k, 30 ,6l 

.30  .35  2k,37,k2 

.kO  2k,37,k9 
.k5  2k, 37,55 
•50  2k, 37,61 
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30  .35 

.1*0 
.U5 
.50 


6,22,29 

6,22,36 

6,22,1*1* 

6.22.51 

6.22.59 
6,22,66 
6,22,71* 

6,29,36 

6,29,1*1* 

6.29.51 

6.29.59 
6,29,66 
6,29,71* 

6,37,1*3 

6.37.51 

6.37.59 

6.37.66 
6,37,71* 

6, 1*1*, 51 

6.44.59 

6.44.66 
6,44,74 


14,22,29 

14,22,36 

14,22,44 

14,22,51 

14,22,59 

14,22,66 

14,22,74 


S. 


.25  11*. 29, 3b 
.30  14 ,29 ,44 
.35  14,29,51 
.40  14,29,59 
.45  14,29,66 
•50  .4,29,74 

.25  .30  14,37,43 

.35  14,37,51 
.40  14,37,59 
.45  14,37,66 
•50  14,37,7^ 

.30  .35  14,44,51 

.40  14,44,59 
.45  14,44,66 
.50  14,44,74 

.20  .25  21,29,36 

.30  21,29,44 
.35  21,29,51 
.40  21,29,59 
.45  21,29,66 
.50  21,29,74 

.25  .30  21,37,43 

.35  21,37,51 
.40  21,37,59 
.45  21,37,66 
•50  21,37,74 

.30  .35  21,44,51 

.40  21,44,59 
.45  21,44,66 
.50  21,44,74 

.25  .30  29,37,43 

.35  29,37,51 
■40  29,37,59 
.45  29,37,66 
.50  29,37,74 

.30  .35  29,44,51 

.40  29,44,59 
•45  29,44,66 
.50  29.44,74 


FOUR  GRADES  N=200 


PI  P2  P3 

01  705  no 

.15 

.20 

.25 

.30 

.35 

.40 

.45 

.50 


.10  .15 

.20 
.25 
.30 
.35 
.40 
.45 
.50 


ACCEPT. 

NOS. Pi 
“1,9,19  .05 

1,9,29 
1,9,39 
1,9,49 
1,9,59 
1,9,69 
1,9,79 
1,9,89 
1,9,99 

1,19,29 

1,19,39 

1,19,49 

1,19,59 

1,19,69 

1,19,79 

1,19,89 

1,19,99 


P2 

.10 


15 


20 


.20 

1,29,39 

.25 

1,29,49 

.30 

1,29,59 

.35 

1,29,69 

.40 

1,29,79 

.45 

1,29.89 

.50 

1,29,99 

.25 

1,39,49 

.30 

1,39,59 

.35 

1,39,69 

.40 

1,39,79 

.45 

1,39,89 

.50 

1,39,99 

.25  .30 

.35 
.40 
.45 
.50 

.30  .35 

.40 
.45 
.50 


1,49,59 

1.49.69 

1.49.79 

1.49.89 

1.49.99 

1.59.69 

1.59.79 

1.59.89 

1.59.99 


.10  .15 


P3 

ACCEPT. 

NOS. 

PI 

P2 

P3 

ACCEPT 

NOS. 

.15 

9,19,2' 

9  .10 

.20 

.25 

19,39,49’ 

.20 

9,19,3' 

9 

.30 

19,39,59 

.25 

9,19,4' 

9 

.35 

19,39,69 

.30 

9.19,5' 

9 

.40 

19,39,79 

•  35 

9,19,6! 

9 

.45 

19,39,89 

.40 

9,19, 7< 

J 

.50 

19,39,89 

.45 

9,19,8! 

s 

.50 

9,19,95 

! 

.25 

.30 

19,49,59 

.20 

.35 

19,49,69 

9,29,3= 

1 

.40 

19,49,79 

» 25 

9,29,4= 

1 

.45 

19,49,89 

.30 

9,29,5= 

.50 

19,49,99 

.  35 

9,29,69 

.40 

9,29,79 

.30 

.35 

19,59,69 

.45 

9,29,89 

.40 

19,59,79 

.  50 

9,29,99 

.45 

19,59,89 

.25 

.50 

19,59,99 

9,39,49 

.30 

9,39,59 

.15 

.20 

.25 

29,39,49 

.35 

9,39,69 

.30 

29,39,59 

.40 

9,39,79 

.35 

29,39,69 

.45 

9,39,89 

.40 

29,39,79 

.50 

9,39,99 

.45 

29,39,89 

.30 

.50 

29,39,99 

9,49,59 

* 

.35 

9,49,69 

.25 

.30 

29,49,59 

.40 

9,49,79 

.35 

29,49,69 

.45 

9,49,89 

! 

.40 

29,49,79 

.50 

9,49,99 

.45 

29,49,89 

.35 

.50 

29,49,99 

9,59,69 

.40 

9,59,79 

.30 

.35 

29,59,69 

.  45 

9,59,89 

.40 

29,59,79 

•  50 

9,59,99 

.45 

29,59,89 

.20 

. 

.50 

29,59,99 

19,29,39 

.25 

19,29,49 

.20 

.25 

.30 

39,49,59 

,30 

19,29,59 

.35 

39,49,69 

.  35 

19,29,69 

.40 

39,49,79 

.40 

19,29,79 

.45 

39,49,89 

.45 

19,29,89 

.50 

39,49,99 

.50 

19,29,99 

.30 

.35 

39,59,69 

.40 

39,59,79 

.45 

39,59,89 

.50 

39,59,99 

109 


FOUR  GRADES  N  =  300 


PI 


_P2 

.05 


.10 


.15 


.20 


.25 


.  30 


P3 

ACCEPT. 

NOS. 

PI 

P2 

P3 

ACCEPT. 

NOS. 

PI 

P2 

P3 

ACCfc^T. 

NOS. 

.10 

O  111  00 
-  *  1  *■  - 

.05 

.10 

.  15 

14,29,44 

,10 

.70 

.25 

29,59,74 

.15 

2,14,44 

.20 

14,29,59 

.30 

29,59,89 

.20 

2.1'-,  59 

.25 

14,29,74 

.35 

29,59,104 

.25 

2,14,74 

.30 

14,29,89 

.40 

29,59,119 

.30 

2,14,89 

.35 

14,29,104 

.45 

29,59,134 

.35 

2,14,104 

.40 

14,29,119 

.50 

29,59,149 

.40 

2,14,115 

.45 

14,29,134 

.45 

2,14,134 

.50 

14,29,149 

.25 

.30 

29,74,89 

.50 

2,14,149 

.35 

29,74,104 

.15 

.20 

14,44,59 

.40 

29,74,119 

.15 

2,29,44 

.25 

14,44,74 

.45 

29,74,131+ 

.20 

2,29,59 

.30 

14,44,89 

.50 

29,74,149 

.25 

2,29,74 

.35 

14,44,104 

.30 

2,29,89 

.40 

14,44,119 

.30 

.35 

29,89,104 

.35 

2,29,104 

.45 

14,44,134 

.40 

29,89,119 

.40 

2,29,115 

.50 

14,44,149 

.45 

29,89,134 

.45 

2,29,134 

.50 

29,89,149 

.50 

2,29,149 

.20 

.25 

14,59,74 

.30 

14,59,89 

.15 

.20 

.25 

44,59,74 

.20 

2,44,59. 

.35 

14,59,104 

.30 

44,59,89 

.25 

2,44,74 

.40 

14,59,119 

.35 

44,59,104 

.30 

2,44,89 

.45 

14,59,134 

.40 

44,59,119 

.35 

2,44,104 

.50 

14,59,149 

.45 

44,59,134 

.40 

2,44,119 

.50 

44,59,149 

.45 

2,44,134 

.25 

.30 

14,74,89 

.50 

2,44,149 

.35 

14,74,104 

.25 

.30 

44,74,89 

.40 

14,74,119 

.35 

44,74,104 

.25 

2,59,74 

.45 

14,74,134 

.40 

44,74,119 

.30 

2,59,89 

.50 

14,74,149 

.45 

44,74,134 

.35 

2,59,104 

.50 

44,74,149 

.40 

2,59,119 

.30 

.35 

14,89,104 

.45 

2,59,134 

.40 

14,89,119 

.30 

.35 

44,89,104 

.50 

2,59,149 

.45 

14,89,134 

.40 

44,89,119 

.50 

14,89,149 

. 

.45 

44,89,134 

.30 

2,74,89 

.50 

44,89,149 

.35 

2,74,104 

.10 

.15 

.20 

29,44,59 

.40 

2,74,119 

,25 

29,44,74 

.20 

.25 

.30 

59,74,89 

.45 

2,74,134 

.30 

29,44,89 

.35 

59,74,104 

.50 

2,74,149 

.35 

29,44,104 

.40 

59,74,119 

.40 

29,44,119 

.45 

59,74,134 

.35 

2,89,104 

.45 

29,44,134 

.50 

59,74,149 

.40 

2,89,119 

.50 

29,44,149 

,45 

' 2,89,134 

.30 

.35 

59,89,104 

.50 

2,89,149 

.40 

59,89,119 

■ 

.45 

59,89,134 

.50 

59,89,149 
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THE  ABBA  SEQUENCE; 

A  PROCEDURE  FOR  COMPARISON  TESTING 

Arthur  Plllersdorf 
Terminal  Ballistics  Division 
Ballistic  Research  Laboratory 
Aberdeen  Proving  Ground.  Maryland 

This  paper  Introduces,  If  not  a  novel*  concept,  certainly  a  new 

acronym:  ABBA,  more  precisely,  A-B-B-A. 

ABBA  is  an  acronym  and,  as  will  be  seen  shortly,  a  mnemonic  term. 

The  ABBA  sequence  Is  discussed  here  as  an  alternative  to  the  AB  method. 

The  latter  tern  describes  an  accepted  and  effective  comparison  procedure  - 

repeated  alternation.  It  Is  the  sequential  procedure  usually  followed  In 

comparing  representative  Items  of  two  batches,  A  and  B.  The  two  syllables 

formed  by  the  letter  sequence  A-B-B-A,  may  be  vocalized,  although  "ABBA" 

Is  not  an  English  word.  The  letters,  ABBA  however,  show  the  critical 

difference  In  the  Implied  pattern.  In  contrast  to  the  unidirectional 

A-B-A-B,  etc.,  ABBA  is  an  Iterative  doubling  back.  In  a  sequence  of  four 

operations,  let  two  each  be  applied  to  two  populations,  A  and  B.  Then  the 

sequence  looks  like: 

A 
1 
4 

SUMS:  5 

(NOTE:  The  numbers  In  the  columns,  the  k^'s  are,  strictly 
speaking,  ordinal  rather  than  cardinal  numbers.) 

*We  learned  at  this  Conference,  at  the  lunch  table,  to  be  exact,  that  what 
we  call  the  ABBA  sequence  was  applied  at  the  National  Bureau  of  Standards 
many  years  ago. 


ill 


The  A-B  comparison  process  Is  not  periodic  or  cyclic.  It  starts 
at  the  left  and  moves  right;  then  again  at  the  left,  thence  to  the  right, 
viz.: 

A 
1 
3 

SUMS:  4 

The  purpose  of  the  ABBA  sequence  Is  to  Improve  confidence  limits. 

These  confidence  limits  are  not  of  the  fiducial,  or  statistical  variety. 
Rather,  these  limits  are  of  the  human  variety,  and  refer  to  the  confidence 
of  three  groups  In  the  experimental  data  of  mutual  Interest.  These  groups 
•re:  the  experimenters,  the  technicians  and,  of  course,  the  statisticians. 

The  procedure  we  propose  will  be  recognized  In  Its  fundamental  logic 
•s  related  to  the  statistical  principle  of  blocking.  This  blocking  principle 
Is  exemplified  In  Latin  and  Graeco-Roman  squares  or  similar  planned  arrays* 
of  experlmentel  data.  It  Is  our  view  that  our  procedure  Is  a  prior  fundamen¬ 
tal.  It  tells  how  to  obtain  the  data  which  Is  later  treated  better,  from 
the  statistical  viewpoint.  The  experiential  basis  of  the  proposal  may  be 
singularly  our  own,  but  wa  doubt  this  very  much. 

Our  underlying  postulates  are  these: 

1.  Measuring  a  physical  property,  Injecting  a  chemical,  or  shooting 
a  sample  of  aaaainltlon  Is  equivalent,  sul  generis,  to  experimental  treatment. 
Hence,  plural  measurements  (treatments)  and  population  samples  are  combina¬ 
tions. 

*see,  for  example,  the  Youden  rectangle  concept. 
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2.  The  uncertainties  of  temperament,  temperature,  and  time,  give 
rise  to  sources  of  error,  bias,  and  sequential  or  cumulative  effects. 

3.  Firing  a  gun,  of  any  caliber,  Inserts  heat  int.'  a  dynamic  system. 
Thermal  energy  transfer  may  cause  changes  in  such  kinetic  parameters  as 
velocity,  yawing  motion,  or  recoiling  motion.  These  effects  are  known  In 
ballistics.  Hence,  we  view  each  shot  as  a  treatment. 

4.  Planting  seeds  In  each  of  several  plots  (*  sampling  each  ground 
lot)  is  also  a  treatment.  Let  all  its1  seeds  be  planted  first  In  one  plot, 
and  then  Its*  seeds  be  planted  In  the  second  area  of  soil.  Our  view  Is 
that  the  seed  of  the  first  plot  was  ''treated1'  differently.  It  might  be  In 
colder  soil  longer,  or  have  more  time  to  absorb  Initial  moisture.  Also, 
during  the  planting  of  Plot  1,  the  "planter"  may  have  lost  or  Increased  Its 
tension.  The  "planter"  may  be  a  human  being,  or  a  mechanical  device  In¬ 
corporating  control  cables  and  springs.  Tension  Is  still  tension. 

5.  Therefore,  It  is  desirable  that  similar  times  shall  have  elapsed 
during  the  seeding  (treatment)  of  all  three  plots  of  ground. 

6.  As  a  first  approximation,  the  sums  of  the  ordinal  Integers,  In 
plain  English  -  the  step  numbers,  should  be  as  nearly  equal  as  we  choose. 

The  equalities  may  be  required  at  any  time  during  or  after  the  experiment 
(see  (a)  on  the  first  page  of  this  article). 

7.  A  first  choice  is  that  the  sums  of  the  ordinal  numbers  (the 
cumulative  sum  of  the  sequential  positions)  at  the  end  of  an  experimental 
interval  shall  be  equal.  If  columns  are  lots  and  rows  are  samples,  then 

i 

the  sums  of  columns  A,  B,  etc.  (tk^)  should  be  equal*. 

♦Epilogue:  we  were  pleased  to  hear  Dr,  Youden  recall  how  he  had  once  worked 
on  this  equality  of  column  sums  and  had  found  the  attempt  had  been 
made  for  another  purpose  In  an  old  math  book. 
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8.  More  generally,  we  may  prefer  that  the  total  time  Intervals  of 
the  preceding  treatments,  or  sequence  summations,  ere  equal.  This  would 
mean  that  the  sums  of  the  columns  across  as  many  rows,  from  the  second  row 
to  the  last,  should  be  equal.  We  shall  see  that  we  can  have  this  half  the 
time. 

From  the  foregoing,  a  basic  value  judgment  can  be  Inferred.  In  an 
ABAB  comparison,  the  time  Intervals  within  a  column  are  equal.  These  are 
the  time  Intervals  between  successive  samples  within  the  group  or  lot.  The 
test  samples  of  A:-l,  3,  5,  and  B:-2,  4,  6,  have  equal  chances  of  something 
going  wrong,  intime,  within  the  groups,  A  and  B.  But  the  environment,  equip¬ 
ment  and  personnel  are  also  subject  to  error-random  or  otherwise.  We  choose 
to  equalize  the  error  sources  -  time, temperature,  and,  psychologically, 
temperament.  These  may  affect  the  sample  behavior  more  than  Its  standby  time. 

Finally,  If  we  study  the  array  In  (d)  below,  we  see  a  singular  difference 
between  ABBA  and  ABAB.  Both  are  alike  In  that  samples  precede  and  follow  the 
others,  one  treatment  at  a  time  (A-l  precedes  B-2j  B-2  follows  Al;  and  precedes 
A-3,  etc.).  But  In  the  ABBA  sequence,  equal  members  of  treatments  In  both 
columns  precede  and  follow  another  treatment  (position  or  ordinal  Integer) 
within  the  column.  In  brief  (cf  (d)),  there  are  pairs  In  the  columns. 

Our  value  judgment  of  vertical  pairing  for  achieving  better  balance, 
l.e.,  less  cumulative,  sequential  bias.  Is  supported  by  C.  C.  LI  (1): 

"The  criterion  for  balanced  sequences  Is  that  every  treatment  Is 
preceded  nr  followed  by  all  the  other  treatments,  the  same  number  of  times." 

The  foregoing  Is  cited  as  an  advantage  of  alternate  pairing  with  only 
two  populations  or  lots  (A  and  B).  With  three  or  more  lots,  (columns),  the 
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pairing  is  found  only  In  the  extreme  or  "doubling-back"  columns,  i.*,. 
under  A  and  C  only: 


ABC 

1  2  3 

6  5  4 

7  8  9 

12  U  10 

£kj  26  26  26 


(c) 


ON  THE  ASPECT  OF  SAMPLE  SIZE 

If  the  sample  number  per  lot,  r,  is  very  large,  the  difference  In  the 
sums  of  the  ordinal  numbers  under  A  and  B  (ck^)  becomes  relatively  small. 

If  the  total  number  Is  forty,  twenty  samples  per  column,  or  lot,  then  dalet* 

40  or  <40  ■  „  g20,  since  every  element  In  the  B  column  Is  one  greater 

than  that  In  the  same  row  In  A,  £Aj  +  20  ■  sB^  and  £A^  >  400*  EB^  ■  420. 

The  final  difference,  20,  Is  only  5  percent  of  the  EA^,  For  N  »  60i  the 
percentage  difference  Is  even  less.  Hence,  for  the  final  cumulative  effect, 
an  A-B-A-B  may  be  just  as  good  as  an  ABBA  sequence.  But,  who  has  tested  or 
compared  thirty  pairs  of  experimental  Nlke-Hercules  motors  In  one  day?  - 
under  "steady-state"  conditions?  -  with  a  priori  certalnlty  that  the  experi¬ 
ment  will  be  completed? 

The  advantage  of  ABBA  comes  when  there  are  Interruptions,  either 
unforeseen  or  scheduled. 

THE  ABBA  SEQUENCE  AND  THE  DIAGONAL  SEQUENCE 
As  Indicated  previously,  the  ABBA  sequence  has  several  features  of 
Interest.  For  two  samples,  usually  a  standard  and  an  experimental  sample, 

"See  Dalet  h  and  the  ABBA  Sequence,  below. 
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the  ABBA  sequence  Is  an  Iteration  of  the  staggered  diagonal  cycle.  This 
Is  seen  In  the  fol lowing  array: 


A  B 

1  2 

4  3 

5  6 

8  7 

9  10 

12.  11 

Ck1  39  39 


(d) 


In  the  two  column  array  above,  starting  at  A  and  ending  at  B  Is  the 
first  swing.  Starting  the  next  cycle  at  B,  the  2nd  column,  and  completing 
the  cycle  by  advancing  to  the  next  vacancy  In  the  row,  terminates  at  A.  If 
we  have  a  series  of  columns,  k,  equal  in  number  to  the  rows,  r,  we  can  have 
a  staggered  cycling  sequence  which  provides  a  diagonal  of  starting  points, 
e.g.: 


CASE  5  X  S 


A 

'  B 

C 

D 

E 

1 

2 

3 

4 

5 

10 

5 

7 

8 

9 

14 

15 

11 

12 

13 

18 

19 

20 

16 

17 

22 

23 

24 

25 

21 

65 

65 

65 

65 

65 

The  diagonal  (1,  6,  11,  16,  21)  gives  us  an  r  X  k  square. 

Thus,  If  r  ■  k,  even  If  r  Is  odd,  we  can  attain  the  desired  equality 
of  find  column  sums.  This  staggered  cycling,  or  diagonal  Inception  of 
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each  succeeding  row, is  found  in  Youden's  rectangle. 

For  those  who  may  insist  on  the  repeated  alternating  cycle  (A-B,  A-B), 
for  whatever  reasons,  the  following  is  reassuring:  A  two-population  ABBA 
sequence  Is  nearly  a  repeated  A-B.  View  the  second  and  third  (under  B)  as 
one  sample  (of  two  Items)  and  the  fourth  and  fifth  steps  (under  A)  as  the 
other  sample  of  two  items.  We  are  then  testing  alternate  pairs.  The 
difference  Is  that  we  begin  with  singleton  A  (Al),  and  end  with  singleton 
B  (B^),  when  the  sample  number  for  A  equals  that  of  B. 

(Recall:  A  B 

(singleton)  1  2)  „A* 

NgN  (A  3) 

(5  6)  „A.  (f) 

ugN  (8  7) 

(9  10  (singleton) 

Note  that  the  ABBA  sequences  of  (d)  and  (f)  above,  give  equal  sums  of  the 
ordinal  numbers  at  every  even-numbered  row.  This  equality  holds  for  any 
number  of  columns. 

If,  as  Is  often  the  case,  r  Is  much  greater  than  k,  we  have  another 
problem.  We  can  form  successive  k  by  k  squares,  as  a  choice.  Then,  at  the 
very  best,  we  have  a  series  of  squares,  at  least  3x3.  In  these  3x3 
squares  the  ordinal  sums  are  equal  only  every  third  row.  If  such  a  series 
of  odd-sided  squares  has  a  total  number  of  rows  which  Is  even,  It  would  of 
course  be  better  to  use  ABBA.  Then  every  second  row  Is  equal,  Including 
the  last.  If  both  r  and  k  are  odd  and  r  doesn't  divide  Into  k,  what  do  we 
do?  Let  k  ■  3,  r  ■  5.  A  3  by  3  diagonal,  plus  a  doublet  (ABC  -  CBA)  gives 
equal  sums  at  stage  r  ■  3  and  r  »  5  (g)  or  at  r  -  2  and  r  ■  5  (h). 
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CASE  3  X  S; 


A 

B 

C 

(h) 

A 

B 

n 

1 

2 

3 

1 

2 

3 

6 

4 

5 

6 

5 

4 

_8 

_9 

_7 

_7 

_8 

_9 

15 

15 

15 

14 

15 

16 

10 

11 

12 

12 

10 

11 

15 

14 

13 

14 

!i 

13 

40 

40 

40 

40 

40 

40 

Here  we  have  the  compromise  or  combination  of  diagonal  and  ABBA  cycles. 

PALET  N  AND  THE  ABBA  SEQUENCE 

A  new  symbol  Is  appropriate  for  Indicating  the  sum  of  the  ordinal 
numbers  of  the  total  samples  available  for  an  experiment.  The  symbol  we 
propose  Is  <  ,  da let*.  Dalet  Is  a  triangle,  like  Its  Greek  descendant, 
delta,  but  dalet  points  from  right  to  left.  It  Is  applicable  to  both  letter 
«iN  and  to  number  <10  as  a  symbol  of  summation. 

The  sum  of  an  arithmetic  progression  of  the  Integers  from  zero  to  an 

Indefinite  Integer,  N,  Is  obtained  from  the  equation: 

1  ■  N 

I,  .  Hljftll.  .K  (1) 

t  •  1 

Hence,  dalet  N,  or  dalet  any  number  Is  a  "triangular  number"  In  Pascals 
Triangle.  The  symbol  dalet  and  equation  (1)  are  useful  for  determining  If 

the  sums  of  the  columns  of  ordinal  numbers  can  be  equalized. 

*  dalet  (pronounced  dah-let)  Is  the  name  of  the  ancient  Hebrew  letter  which 
Is  fourth  In  the  alphabet.  It  Is  the  precursor  of  the  Greek  "delta." 

Dalet  means  "a  tent  flap."  Later,  It  came  to  mean  "a  door."  ;j 

j 

I 
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Let  us  distribute  the  N  Inteqers  from  one  to  N.  in  rows  (r) 
columns  (k)f  with  equal  numbers  of  Integers  In  each  column.  Then  or 
"dalet"  N  ■  kri^r-t.  31  (2) 

In  two  columns  of  ten  rows,  the  arithmetic  progression,  1  to  20, 
sums  to.  2  •  10  0r  210.  {<20  ■  210).  (3) 

To  determine  If  there  can  be  equality  of  sums  of  Individual  columns, 
of  equal  rows  each,  it  Is  necessary  to  use  formula  (2)  and  let  k  be  odd: 


Then  2|k**  *  r  (kr+1)  Implies  2|k  or  2 | r  (kr+1).  (4) 

But  2  does  not  divide  Into  k  {2  J[  k)s 


An  array  of  odd-numbered  columns  gives  a  sum  that  yields  an  Integer 
quotient  regardless  of  whether  rows  are  odd  or  even. 

k|k  •  J  •  (kr+1)  (8) 

If  the  rows  (r)  are  odd,  and  the  columns  (k)  are  even,  then 

k  J  Kf  IjjEtU  (k  does  not  divide  Into  ...etc.)  .  (9) 


If  kr  Is  even,  then  kr+1  Is  odd  and  2  J  kr+1,  since  2|kr  and  2  /  r 
(r  1$  odd);  then 

rjkral 

Is  not  an  Integer. 
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Therefore,  odd  rows  do  not  automatically  yield  equal  sums  of  columns. 
If  the  columns  be  even.  Examples? 


1  2 

4  3 

J _ 6_ 

10  11 

,20  .  210  - 


12  3  4 

8  7  6  5 

9  10  11  12 

16  15  14  13 

17  18  19  20 

Is  not  an  Integer. 


The  reader  will  note  that  the  foregoing  treatment  applies  to  the  case 
of  equal  sample  numbers,  or  rows,  however  small,  for  each  population  sample 
or  column.  The  same  procedure,  however,  can  be  applied  to  a  group  of  unequal 
samples,  1*e., 

1,  Determine  N,  the  total  number  of  samples. 

2,  Determine  <N. 

3,  Divide  by  k,  the  number  of  columns  or  populations  or  lots.  If  the 
quotient  Is  an  Integer,  the  sums  of  the  ordinal  numbers  can  be  equal  for  all 
columns  or  lots. 


ON  CASES  OF  UNEQUAL  SAMPLE  SIZES 

Fur  statistical  Inferences  based  on  application  of  Student's  t  and 
the  t-llke  (t*)  statistics.  It  has  been  shown  that  both  statistics  have  the 
same  value  when  two  samples  are  of  the  same  size.  The  mathematical 
expression  for  df*  (degrees  of  freedom  for  the  t-llke  or  t*  statistic) 
simplifies  considerably  when  n^  ■  n2.  Further,  when  the  two  groups  are  of 
equal  size,  the  value  of  df*  reduces  to  2(n  -  1),  wholesomely  large,  If  the 
variances^  of  the  two  groups  Sj*  and  S22  are  equal^.  This  Is  taken  to 

'  TJ  More  precisely,  this  Is  the  estimate  of  the  variance  with  df  ■  n  -  1. 

2  These  comments  can  be  explored  In  Reference  1  (LI). 
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mean  that  when  group  variances  are  unequal  It  Is  even  more  desirable  to 
nave  yruups  of  equal  size,  in  a  planned  experiment,  therefore,  hope  for 
equal  numbers  of  samples. 

Let  us  suppose,  however,  that  the  samples  are  small  In  number,  and  of 
unequal  size.  This  situation  happens  when  the  test  Items  are  expensive, 
experimental,  or  exotic.  Another  reason  for  unequal  numbers  of  observations 
of  samples  may  be  the  exigencies  of  time.  What  Is  the  simplest  rule  for  any 
number  of  plots,  blocks,  or  columns,  when  the  numbers  of  rows  or  samples  per 
lot  are  unequal?  A  uniform  procedure  would  be  to  start  with  the  sample  of 
largest  number. 

CASE  5-4-3  shows  how  the  equality  of  final  sums  requires  abandoning 
a  partial  square  (a)  with  Its  diagonal  3X3  array: 

(a)  A  B  C  (b)  A  B  C 

1  1 
2  3  2  3 

J  4  5  6  6  4  5 

9  7  8  7  8  9 

li  22  10  10  n  12 

Ek1  27  27  24  26  26  26 

Note  that  «N  is  divisible  by  k. 

Case  b.  Illustrates  that  the  use  of  dalet*  or  <N,  here  <12,  Is  the 
first  order  of  business.  Second  Is  the  Injunction  evident  In  Case  a.  also,  - 
use  up  the  surplus  first! 

•Precursor  of  the  Greek  letter  delta  Is  this  ancient  Hebrew  form  of  the 
letter  "dalet"  (modern  type  *T  ). 
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INTRODUCTION  AND  SUMMARY.  When  a  sample  discriminant  function  is 
computed,  it  is  desired  to  estimate  the  chance  of  misclassification  using  this 
discriminant  function.  This  is  often  done  by  classifying  the  sample  using 
the  sample  discriminant  function  or  by  computing  K-D/2)  where  *  is  the 
cumulative  normal  distribution,  and  D2  is  Mahalanobis'  distance.  When  the 
sample  dls.rlminant  function  is  applied  to  a  new  sample,  the  observed 
probabilities  of  misclassification  are  usually  found  to  be  greater  than  those 
computed  from  the  initial  sample. 

The  purposes  of  this  paper  are  to  show  that  this  increase  in  the 
probabilities  of  misclassification  are  directly  related  to  the  "shrinkage" 
of  R2  in  new  samnlea  and  that  these  are  related  to  the  unbiased  estimation 
of  Mahalanobis1 i 2  using  D2. 

DISCRIMINANT  ANALYSIS.  Discriminant  analysis  provides  a  method  of 
obtaining  a  function  of  a  set  of  p  multivariate  observations  which  provides 
maximum  separation  between  groups.  In  this  paper  we  shall  be  concerned  only 
with  the  case  of  two  groups.  Let  (^denote  the  first  population ,ir2  Che  second, 
x. -  <xlt  x2,...,x )’  be  a  column  vector  of  observations,  the  mean  vector 

in  the  kth  group  (k  -  1,2),  I  the  coanon  covariance  matrix,  and  and  S 

the  sample  means  and  covariances. 

It  la  well  known  that  the  sample  discriminant  function  for  discriminating 
two  groups  is 

(1)  Da(x)  -  (x  -  (1/2)  (Xj  +  x^))'  -  gj 

which  is  conditionally  (on  x1 ,  ,  and  S)  normally  distributed  and  has  mean 

(in  the  kth  group) , 


The  remainder  of  this  article  was  reproduced  photographically  from  the  author's 
copy. 


(2)  D  (,.  \  _  _  l*/Z  jJZ  \  \  «  -Z  \ 

e*k'  "^k  ‘^1-2"  -  ^1-2' 

and  variance  (in  either  group) , 

(3)  VD  -  (Xj-Xj)’  S"1  l  S"1^). 

If  It  is  known  that  the  chance  of  an  Individual,  randomly  selected 
from  the  population,  has  probability  q  of  belonging  to  group  1  and  1-q  of 
belonging  to  group  2,  then  the  classification  rule  that  is  used  is  ’’classify 
x  into  if 

q 

(4)  D  (x)  +  log  -  >0 

1-q 

and  into  x2  otherwise." 

q 

In  this  paper  ve  will  assume  q  -  .5  so  log  -  »  0, 

1-q 

If  x  is  multivariate  normally  distributed,  then  the  probability  of 

.  .  • 
misclaaaifying  x  conditional  on  x^  x„,  and  5  is 

(5)  t1  -  P<Dg(x)  <  0|x  c 

or  P2  ■  P(Da(x)  >  0  x  nij)  . 

Pj  is  given  by 

<6)  P2  -  *(-Dg(]i.1)/^ 


and  a  similar  expression  holds  for  ?2,  where  *  is  the  cumulative  normal 
distribution. 

Estimating  by  5^  »  J*'  *n<1  VD  by 

B"1 

♦»  ■  1  ^w-v2/  (n^+n2-2)  is  equivalent  to  estimating  by 
i— 1  a 

and  Z  by  S. 
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Thus  we  obtain 


(?)  P$  -  *(-D2/2  /  Dl  «  it-Ti/n  .  d* 

x  /  '  *  Z 

where  D2  -  (jij-x,,) '  S  1(x1~x2)  is  Kahalanobis’  distance.  This  is  a  biased 
estimate  of  Px  and  gives  too  favorable  an  estimate  of  p  . 

Thus,  in  general,  we  should  expect  to  find  a  higher  rate  of  mis- 
classification  when  applying  the  sample  discriminant  function  to  new  data 
than  indicated  by  4(-D/2). 

It  is  of  some  interest  to  consider  the  expectation  of  and 

VD  over  repeated  samples  of  size  ^  and  n^  We  shall  need  the  expectations 
of  S.  and  S  1.  Lachenbruch  and  Mickey  [1965]  have  shown  that 

,  n,+n0-2  , 

(8)  E(S  h  -  - i1  .  c  r-1 

nl+n2“p-3 


Chfh  -  I1  <-! ■H.2~:»)(n14«;-2)^  _  e  . 

^nl+n2~p~3) (ni+n2~P"3) (n^+n2~P_5) 


ECDg^))  -  tr  E(Dfl(iilt)) 


-  tr  E((x1-x2)(Hk-^(x1+J2))'S_"1) 

-  tr  EU(x1-x2)wk-l41^4^a^)r1C1 

l  2  -  2Bln2 

-  ci  L,  1?k+i  ptynpj  ^ 

2  nin2  * 
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Similarly,  we  have  for 


(10) 


Thus, 

(ID 


B(Vd)  -  ) ' S_11S " 1  (xx-£2 ) ) 

-  tr  (x^ip  ’  S-1rs_1) 

■  tr  (x1-£2)'S“1)C2 

-  +  (^  +  ^)y£]c2 

f  p(n,+n2)l 

“  C2  (iL^Uj)  'I  (U^)  + -  « 

l  nln2  ' 


E<Dg(uk)> 


nl’hl2'2 

2(r.1+n2-p-3) 


fi2(-l)k+1 


P("2~ni) 

nln2 


e(vd) 


z  p(n1+n2)  |  .  (nj+n2-3)  (n^hy-2)2 

nln2  1  (ni+n2~p“2^  (ni+n2"P-3^  <‘nl+n2-p“5^ 


Although  Dg(x)  ia  normally  distributed  conditionally  on 
and  S ,  it  la  not  unconditionally  normally  distributed.  For  and  n^  suffi¬ 
ciently  large,  the  unconditional  distribution  is  very  close  to  normal. 

Thus,  considering  the  values  of 

(12)  '  -  *(e(-d8(u1))/v^Tv^T  ) 

and  *7  "  *<E<I»BCu2))  /  /ETV^  ) 

will  supply  approximate  values  of  and  Pj  for  samples  of  sizes  n^  and  n^. 
There  are  three  error  rates  of  interest: 


(a)  The  error  rate  for  the  particular  sample  discriminant 
function.  This  is  given  by  (6) . 

(b)  The  expected  error  rate  over  all  samples  of  size  n^,  • 


This  is  given  by  (12) . 
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(c;  The  error  rate  that  would  hold  if  we  knew  the  parameters 
of  the  distribution.  Thin  is  given  by  $(-S/2). 

Some  properties  of  equations  (9),  (10),  and  (12)  are  of  interest. 

First,  if  nj  -  n2;  |E(D8(Mk))|  >  42/2.  For  large  n2,  n2,  E(D#(yk>)  +  4a/2(-l)k+1. 

In  general,  the  variance  of  the  sample  discriminant  function  Is  always 
greater  than  the  variance  of  the  population  discriminant  function. 

The  properties  of  and  imply  that 

a)  If  nx  -  n2,  |E(Dfl(pk)  |/^(V^r  <  4/2. 

b)  If  "j/hj  is  large  or  n^^  is  lar8e  an<1  6  i-8  8mall,  then 
one  of  |E(Dg(vk))  |//e(Vd)  will  be  >  4/2  and  the  other  will 
be  <  4/2. 

c)  In  most  circumstances  we  will  have  |E(D#(uk>) | //E(VD)  <  4/2, 
so  we  may  conclude  that  the  probability  of  misclassification 
in  either  group  la  greater  than  the  optimum,  9(-4/2), 

Table  1  gives  examples  of  the  ratio  E(D#(nk)) //E(v^)  for  various 
values  of  4,  n^,  n2  and  p. 


Table  1.  Ratios  Used  in  Calculating  Error  Rates 


p 

nl 

n2 

4a 

E(W)//E(V 

E(Dt(u2))//E^> 

6/2 

2 

6 

6 

1 

.3086 

.  -.3086 

.5 

2 

6 

6 

4 

.7377 

-.7377 

1.0 

2 

.  4 

20 

1 

.2189 

-.5108 

.5 

2 

4 

20 

4 

,7747 

-.9469  • 

V 

1.0 

4 

12 

12 

1 

.3368 

-.3368 

.5 

4 

12 

12 

■  4 

.8051 

-.8051 

1.0 

4 

4 

20 

1 

.0586 

-.5277 

.5 

4 

4 

20 

4 

.6102 

-  -.9153 

1.0 

10 

30 

30 

i 

.3478 

-.3478 

.5 

10 

30 

30 

4 

.8313  ■ 

-.8313 

1.0 

10 

.  10 

50 

1 

.0605 

-.5448 

.5 

10 

10 

50 

4 

.6300 

-.9450 

1.0 
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Finally,  we  note  that  we  nay  uae  the  unbiased  estimate  of  42  based 


-  D2/2  -  Cx  p/nj 

-  -D2/2  +  cl  p/n2 

^  (nj+n2*3)  (n^4tij-2) 

(nj+nj-p-5) 

Thus  we  obtain  an  estimate  of  P^  for  the  discriminant  function 
based  on  samples  of  site  n^,  n2: 

A 

P2  -  «<~<D2/2  -  CjP/np  / 

which  is  always  greater  than  *(-D/2). 

Sample  Sire  for  Discriminant  Functions 
The  above  results  may  be  used  to  determine  the  sample  sice  re¬ 
quired  to  obtain  error  rates  within  a  given  tolerance  of  the  optimum. 

The  question  we  ask  is  "How  large  should  and  n2  be  for  the 
sample  discriminant  function  to  have  an  error  rate  within  y  of  the  opti¬ 
mum  value?"  The  answer  depends  on  p,  y,  and  42.  For  equal  sample  sires 
the  results  are  given  in  table  2. 

From  table  2,  we  see  that  y  ■  .1  yields  very  small  sample  sires, 

while  y  «  .01  causes  large  samples  to  be  taken.  The  larger  the  separation 
the 

between  the  groups , /smaller  the  sample  size  needed.  As  p  increases,  the 
sample  size  also  increases,  but  the  ratio  n/p  decreases  for  fixed  62  and 
Y- 

Because  of  the  non-linear  relation  between  n^,  n^,  p,  42  and  P^, 
for  y  ■*  .1  wc  find  that  a  larger  sample  is  needed  for  d2  •  4  than  for 


on  D2  to  obtain 


(13) 


e(db(u2)) 


Table  2. 


Minimum  Sample  Size,  n(»rij»n2) ,  In  Each  Group  Required  for  Expected 
Error  Rate,  ,  to  be  Within  y  of  Optimum  Error  Rate,  P^, 
for  Various  Number  of  Parameters,  p. 


i 

\ 


P1 

Y 

h 

n 

.309 

.1 

.395 

5 

.159 

.1 

.256 

5 

.067 

.1 

.151 

5 

.309 

•  .05 

.356 

9 

.159 

.05 

.206 

8 

.067  . 

.05 

.111 

7 

.309 

.01 

.318 

47 

.159 

.01 

.169 

32 

.067 

.01 

.077 

22 

.309 

.1 

.403 

7 

.159 

.1 

.245 

8 

.067 

.1 

.154 

7 

.309 

.05 

.358 

15 

.159 

.05 

.206 

13 

.067 

.05 

.116 

10 

.309 

.01 

.318 

89 

.159 

.01 

.169 

56 

.067 

.01 

.077 

37 

.309 

.1 

.407 

9 

.159 

.1 

.253 

10 

.067 

.1 

.157 

9 

.309 

.05 

.357 

22 

.159 

.05 

.206 

18 

.067 

.05 

.113 

14 

.309 

.01 

.319 

130 

.159 

.01 

.169 

80 

,067 

.01 

.077 

51 

.309 

.1 

.403 

12 

.159 

.1 

.258 

12 

.067 

.1 

.159 

11 

.309 

.05 

.358 

28 

.159 

.05 

.208 

22 

.067 

.05 

.115 

17 

.309 

.01 

.319 

172 

.159 

.01 

.169 

104 

.067 

.01 

.077 

66 

.309 

.1 

.406 

14 

.159 

.1 

.253 

15 

.067 

.1 

.160 

13 

.309 

.05 

.358 

35 

.159 

.05 

.208 

27 

.067 

.05 

.117 

20 

.309 

.01 

.319 

213 

.159 

.01 

.169 

129 

.067 

.01 

.077 

81 

.309 

.1 

x  .406 

26 

.159 

.1 

.256 

27 

.067 

.1 

.163 

23 

.309 

.05 

.358 

67 

.159 

.05 

.209 

51 

.067 

.05 

.116 

38 

.309 

.01 

.319 

421 

.159 

.01 

.169 

250 

.067 

.01 

.077 

154 
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The  Regression  Analogy 

Fieher  (1936)  shows  thst  by  performing  a  regression  analysis  with 
the  dependent  variable  equal  to  n2/(n^+n2)  in  the  first  group  and  -n^/faj+n,,) 
in  the  second,  the  regression  coefficients  obtained  are  proportional  to 
the  discriminant  coefficients.  In  fact,  this  is  true  for  any  two  distinct 
values  of  the  dependent  variable.  See  e.g.,  Cramer  (1967).  The  analysis  of 
variance  of  this  regression  yields  the  same  F  as  the  D2  analysis  does.  Thus, 


(14) 


P 


R2  nj+n2-p-l 
1-R2  p 


and 

(15) 


F  -  D2 


(n^-p-l) 


(nl+n2)  (n^4n2>2) 


are  two  ways  of  expressing  the  same  F  with  p  and  n^+Uj-p-l  degraes  of 
freedom.  Thus 


(16) 


°ln2 


(nj^-hij)  (n^+nj-2) 


R2 

i-R2 


which  is  equivalent  to 
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(17) 


R2  (n^  *  n2>  (ttj+iij— 2) 

1-R2  n^,, 

or 

D2 

j>2  _ _ 

(n..+n,)  (^+0,-2) 

d2  +_i _i - 1 — 2 - 

nln2 

These  relations  will  be  useful  late’r. 

Shrinkage 

In  using  a  set  of  regression  coefficients  computed  from  a  sample 
for  prediction  purposes,  It  is  found  that  the  correlation  between  predicted 
and  observed  values  in  a  new  sample  is  less  than  K.  This  phenomenon  is 
well-known  as  the  "shrinkage"  of  the  multiple  correlation  coefficient.  A 
number  of  methods  have  been  proposed  to  deal  with  the  problem  of  obtaining 
estimates  of  the  "shrunken"  R2.  There  are  at  least  two  correlations  of 
interest.  First,  the  populstion  multiple  correlation  coefficient  p2  that 
would  hold  if  we  knew  the  parameters  of  the  population.  This  value  is  the 
proportion  of  the  variance  that  can  be  accounted  for  by  the  independent 
variables.  The  other  quantity,  which  Lachenbruch  and  Mickey  (1965)  refer  to 
ae  p2,  the  Prediction  Correlation  Coefficient,  is  the  correlation  between 
the  sample  regression  li.-ve  and  the  dependent  variable. 

The  following  relation  holda: 

(18)  P2iP2<E(R2). 

Approximate  unbiased  estimation  of  p2  from  R2  can  be  done  easily 
and  methods  of  doing  this  will  be  discussed  in  the  next  section.  Estimation 
of  p2  is  a  more  difficult  problem  which  can  be  handled  fairly  well  by  a 
technique  described  in  Lachenbruch  end  Mickey.  (1965) . 
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An  exact  formula  for  estimating  p2  is  given  by  Olkin  and  Pratt  (1958) , 

t 


Letting  R2  denote  the  estimate,  they  show  that 
c 


(19) 


R2  ■  1 


n-3 

n-p-1 


(1-R2 )  F  ( 1 , 1  ;Js(n-p+l) ,  1-R2 ) 


la  an  unbiased  estimate  of  p2  where  n  *  n^  +  n2  ,  p  ■  number  of  variables 
and  F(')  is  the  confluent  hypergeometric  function. 

A  first'  order  approximation  to  an  unbiased  estimate  was  given  by 
Wherry  (193$ and  is  easier  to  work  with: 

(1-R2) (p-1) 


(20) 


n-p 


We  will  use  formula  (20)  in  the  enaulng  work. 

Estimation  of  P^  and  P, 

A  number  of  methods  for  estimating  P^  and  P2  have  been  suggested 
[Lachanbruch  and  Mickey  (19683.  For  this  paper,  we  shall  be  concerned  with 
methods  based  on  D2.  Okamoto  (1963)  has  given  an  approximation  based  on 
n^,  n2  and  42,  the  theoretical  dlatanca  between  the  populations. 

Equations  (17)  suggest  that  one  might  estimate  a  "shrunk"  D2  by 


(21) 


.  R?  (n.+nJ (n1+n,-2) 
d*  -  — £ - 1—2 — —  . 


C 


nln2 


(22) 


From  (20)  we  obtain 

n^j-p  (p-1)  (nj+nj)  (nt+n2-2) 


J)2  m  d2 


n2+n2-l 


(nj^+nj-lJn^j 


Table  3  gives  values  of  the  multiplier  of  D  sod  the  correction 
term  for  some  combinations  of  n^  ■  n2  ■  n,  and  p. 
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Table  3.  Multiplier  and  Constant  Terns  for  "Shrunk"  Estimate  of  D2 


n 

D 

multioliar 

correction 

20 

2 

.97 

.10 

50 

2 

.99- 

.02 

»-• 

o 

o 

2 

.99+ 

.002 

20 

4 

.92 

.29 

50 

4 

.97 

.12 

100 

4 

.98 

.06 

20 

10 

.77 

.88 

50 

10 

.91 

.36 

100 

10 

.95 

.18 

Thue,  if  D2  ■  1.0,  n  «  20,  p  ■  10,  d|  ■  .77  -  .88  ■  -.11  which  illustrates 

ona  of  the  drawbacks  of  using  unblassd  estimation  for  D2.  If  n  ■  100,  but 

other  values  wsre  the  sane,  vs  would  havs  D2  ■  .95  -  .IB  ■  .77.  Whan  D2  Is 

c 

small,  and  the  number  of  parameters  is  large  relative  to  the  number  of  ob¬ 
servations,  the  value  of  D2  may  be  negative. 

In  Lschenbruch  and  Mickey  (1968) ,  it  is  noted  that  an  unbiased  estimate 

\ 

of  d2  baaed  on  D2  may  be  obtained  from  the  non-central  F  distribution.  This 
in  another  candidate  for  the  value  of  D|  and  its  value  is  given  by 


nl+n2”p"3  (nj+njj)p 


nl+n2“2 


Equations  (22)  and  (23)  agree  asymptotically  as  they  should. 

The  difference  between  them  1b  due  to  tha  approximation  used  to  obtain 
aquation  (22),  and  to  tha  fact  that  the  R2  computed  from  the  discriminant 
analysis  is  based  on  only  two  possible'  values  of  tha  dependent  variable. 
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DISCUSSION.  Whan  Che  population  parameters  are  known.  It  is  easy 
to  show  that  the  probabilities  of  nieclasslfication  are  given  by  *(-6/2) 
where  6 2  is  Mahalanobls*  distance  between  populations.  Thus,  the  probabi¬ 
lities  of  misclasaiflcation  increase  as  62  decreases.  Okamoto's  work 
indicates  that  this  relation  holds  when  estimates  ars  used  for  the  popula¬ 
tion  parameters.  Since  D2  is  an  overestimate  of  S2,  (—D/2)  will  always 

underestimate  the  true  probabilities  of  misclasaiflcation.  Similarly,  the 
fact  that  R2  is  an  overestimate  for  p2  and  pi  and  the  correspondence  with 
D2  through  the  F  statistic  indicate  the  relationship  between  the  shrinkage 
of  R2  and  the  Increase  of  probabilities  of  misclassification. 
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INTRA-PROFILE  VARIANCE* 
(INTRA- INDIVIDUAL  VARIANCE) 

Claude  F.  Bridges** 
Institutional  Research  Division 
Office  of  Research 
United  States  Military  Academy 
West  Point,  New  York 


This  discussion  does  not  present  a  sophisticated  new  statistic,  rather 
attention  is  called  to  an  easily  obtained  but  seldom  used  type  of  significant 
difference  between  specimen,  individuals  or  groups.  The  title,  intra-profile 
variance,  should  be  meaningful  to  counselors,  phychologlsts,  and  statisticians 
in  the  education  and  personnel  fields.  The  comparable  sub-title  may  be  more 
meaningful  to  statisticians  and  researchers  in  other  fields.  The  applications 
discussed  are  in  the  personnel  areas,  but  the  profile  variance  statistic  could 
be  made  applicable  whenever  several  characteristics  or  attributes  of  individual 
specimen,  components,  or  other  units  are  being  measured. 

Table  1  Illustrates  the  individual  differences  which  the  proposed  statistic 
reflects.  Individuals  "o"  and  "*"  both  have  the  same  average  standard  score 
on  the  four  characteristics  measured  by  X^,  Xj,  and  and  X^,  but  era  quite 

different  individuals.  The  difference  in  conelatency  of  relative  level  in  the 
four  distributions  suggests  that  there  may  be  a  difference  in  the  predictability 
of  performance  for  the  two  and  that  the  quantification  of  such  intra-individual 
characteristics  might  prova  useful. 

My  initial  interaat  in  this  problem  stemmed  from  aoma  remarks  mads  by 
Irving  Lorga  in  1947.  Ha  thought  that,  especially  for  soma  groups  of  personality 
factors,  consistency  in  level  might  be  indicative  of  adjustment.  Dr.  Lorge 
hypothesised  that  statistical  representation  of  such  intra-individual  differences 
would,  for  some  purposes  et  least,  prove  to  contribute  significantly  to  more 
valid  prediction*  than  thuae  baaed  solely  upon  inter-lndlvlduel  differences. 

In  some  types  of  situations  high  intra-individual  variability  might  ba  more 
desirable,  in  others  being  at  about  the  aame  level,  "consistent  across  the  board," 
could  load  to  greater  predictability  of  performance. 

However,  the  concept  is  not  as  naw  as  was  originally  thought.  In  checking 
the  literature  this  was  found  to  bt  yet  another  area  which  had  baen  investigated 
by  Clark  Hull  in  1927.  In  an  article  anti tied,  "Variability  in  Amount  of 
Differant  Traits  Possessed  by  The  Individual,"  he  compared  the  variability  among 


*This  is  a  further  analysis  of  e  concept  reported  at  tha  Septaabar  1966  conference 
of  the  Military  Taatlng  Association  and  at  the  March  1967  apecial  sasslon  of  the 
Psychometric  Society. 

**Any  views  expressed  in  this  paper  are  those  of  the  author.  They  ahould  not  be 
interpreted  aa  reflecting  the  views  of  the  United  Statee  Military  Academy  or  the 
Department  of  the  Army. 
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TABUS  1 


SAMPLE  OF  DIFFERENCES  IN  INTRA- PROFILE  VARIANCE 


Individual 


Standard 

Sc  era 

v  -r  \ 

Maaaura 

Mean 

Relative 

'--J  •  ✓ 

Ts*x 


Typical 


Vary 

Low 


different  persona  in  trait  measures  with  the  variability  in  a  single  individual 
on  these  same  traits.  (Hull:  1927)  He  found  the  amount  of  variability  within 
single  individuals  to  be  about  80  percent  of  the  amount  of  variability  among 
different  individuals.  Apparently  a  significant  source  of  differences  between 
Individuals  Is  being  ignored  when  using  one  composite,  or  weighted  average,  of 
an  individual's  scores,  for  measurement,  prediction,  or  decision  making  purposes. 

Research  by  psychologists,  educators  and  statisticians  concerned  with 
personnel  problems  has  evidenced  increased  interest  in  intra-individual 
differences  such  as  those  indicated  by  variability  among  measures  of  different 
attributes  of  an  Individual  and  by  the  more  complete  profile  or  pattern  analysis 
techniques.  Much  of  this  interest  results  from  increased  recognition  that  the 
interrelationships  within  an  individual  of  a  group  of  measures  may  be  quite 
different  from  the  interrelationships  between  these  measures  in  the  general 
population.  Current  moderator  variable  research  has  found  in  some  situations 
a  variable  that  successfully  identifies  subgroups  within  a  population  for  which 
the  interrelationships  among  variables  differ  significantly. 


While  complete  pattern  or  profile  analysis  techniques  entail  several 
relatively  complex  problems,  a  statistic  representing  intra-profile,  or  intra- 
individual,  variance  is  easily  obtained. 


2 

When  SD  p  ■  intra-profile  variance;  n  -  number  of  teste,  factors,  subtexts 
or  other  characteristics  measured  and  reported  on  comparable  scales;  and  E*2  m 
the  sum  of  the  squares  of  the  deviations  of  an  individual's  scores  on  each 
variable  from  his  mean  score  on  all  n  variables,  then: 


SD2p 


2 

If  EX  -  sum  of  one  man's  scores  on  all  n  variables,  and  EX  ■  the  scores 
squared  and  added,  the  gross  score  formula  would  be: 


sd2, .  aHLjJSSil 

n 


(2) 


When  beta  weights  for  comparably  scaled  scores  on  the  different  abilities  or 
traits  are  available,  these  could  be  used  to  obtain  a  measure  of  lntra-proflle 
variance  that  should  have  more  validity.  The  deviation  formula  for  weighted 
intra-profile  variance  would  be: 


(3) 


The  corresponding  gross  score  formula  would  be: 

*2 


SD" 


PW 


EWEWX2  -  (EWX)2 


<EW)‘ 


(4) 
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If  Che  beta  weights  are  not  available,  but  ocher  useful  bases  for  weighting 
sach  scale  in  proportion  to  its  Importance  in  a  given  performance  are 
available,  such  judgment  derived  weights  might  be  used  in  this  formula. 

Although  the  United  States  Military  Academy,  West  Point,  currently 
compares  favorably  with  the  best  universities  In  the  effectiveness  with 
which  academic  performance  is  predicted,  this  still  means  that  only  50  percent 
of  the  factors  that  make  for  differences  In  level  of  academic  performance  are 
being  measured.  Though  proud  of  our  relative  success,  we  would  like  to  reduce 
the  variance  not  predictable  currently. 

Appropriate  basic  data  were  available  on  the  843  cadets  who  entered  USMA 
in  July  1964  and  completed  enough  of  the  first  year  at  USMA  to  have  academic 
grades;  816  remained  one  year  and  789  remained  one  and  one-half  years.  Table 
2  shows,  for  the  843  cadets,  the  correlation  coefficients  between  the  following: 

(1)  the  academic  average  earned  at  USMA — "Acad  Av"; 

(2)  the  weighted  average  of  Scholastic  Aptitude  Test-Verbal,  SAT- 
Mathematlcs,  College  Entrance  Examination  Board  Mathematics 
Achievement,  CEEB  English  Composition,  and  High  School  Rank 
standard  score, —-the  five  components  of  the  academic  potential 
battery,  "Acad  Pot"; 

(3)  the  standard  deviation  of  the  weighted  scores  on  these  five 
components— -the  "SDPW"  intra-profile  statistic; 

(4)  the  Academic  Achievement  Index,  a  statistic  reflecting  the 
academic  average  tilth  measured  academic  potential  held  constant 
(partlalad  out)  and  thus  identifying  over-achievers,  par-achievers , 
and  under-achievers— "AAI." 


TABLE  2.  Selected  Correlation  Coefficients* 

rAcad  Av  •  SDPW  -  -.06 
rAcad  Pot  •  SDPW  -  -.23 
rAcad  Av  *  Acad  Pot  -  .68 
rAAI  •  SDPW  -  .12 

#t*SDPW  on  AAI  -  .21  (f  -  1.56;  .10  >  P  >  .05) 

et*AAI  on  SDiU  -  .17  (f  -  0.86;  P  >  .10) 

RAcad  Av  <  Acad  Pot,  SDPW  -  .69 


*Means  and  standard  deviations  on  each  variable  are  given  in  Table  3. 
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Even  with  a  population  of  843,  simply  adding  the  SDPW  to  the  regression 
equation  did  not  significantly  increase  the  validity  with  which  level  of 
academic  achievement  was  predicted.  However,  inspection  of  the  AAI  line-of- 
means  revealed  the  marked  tendency  for  the  241  cadets  with  a  SDPW  of  70  or 
more  to  be  over-achievers.  In  fact,  the  15  cadets  with  SDPW's  greater  than 
160  had  a  mean  Academic  Average  more  than  one  standard  deviation  above  that 
predicted  from  their  measured  academic  potential.  The  602  cadets  with  AAI 'a 
of  69  or  less  tended  to  be  under-achievers  relative  to  the  academic  achievement 
predicted  from  their  composite  academic  potential  score.  Hence  the  possibility 
was  explored  that  the  SDPW  would  serve  as  a  moderator  variable  (an  executive 
variable)  to  identify  two  groups  in  which  the  interrelationships  of  the 
variables  involved  were  sufficiently  different  that  different  equations  for 
the  two  groups  would  yield  more  valid  predictions. 

For  the  total  group  of  843  cadeta,  the  multiple  correlation  of  the  five 
regular  academic  potential  component#  with  the  Academic  Average  was  .694;  for 
the  602  cadeta  with  SDPW's  of  leas  than  70,  the  independently  computed  multiple 
correlation  was  .695;  for  the  241  whose  SDPW's  were  70  or  more,  the  multiple 
correlation  was  .693.  The  beta  weights  in  all  thrae  equations  were  almost 
identical.  The  hypothesis  that  the  interrelationships  among  thsse  variables 
were  the  same  for  two  groups  identifiad  by  a  critical  SDPW  of  69.5  could  not 
be  rejected. 

■ 

Although  neither  of  the  above  approaches  successfully  utilized  Intra- 
profile  variance,  Table  3  shows  clearly  that  ths  cadets  who  made  the  most 
of  their  ob j ec t ively*measured  academic  potential  had  significantly  higher 
intra-profile  variance  on  the  five  component  measures.  Perhaps  the  individual 
differences  model  for  multiple  regression  proposed  by  Dr.  Cleary  (1966)  and 
discriminant  function  analyses  will  show  how  to  use  the  statistic  In  this 
instance.  At  any  rate,  the  ease  with  which  this  statistic  can  be  obtained, 
along  with  othar  statistical  data  at  no  extra  cost,  would  seem  to  warrant  lta 
incorporation  into  the  model  for  validity  studies.  This  would  be  especially 
true  when  there  is  reason  to  hypothesise  that  high  lntra-lndlvldual  variance 
would  be  desirable  or  when  across-the-board  consistency  in  performance  ie 
desired. 

Several  other  applications  of  this  statistic  may  be  useful.  In  the 
military  personnel  situation,  intra-individual  differences  may  be  of  considerable 
utility.  In  general,  a  man  who  is  rather  uniformly  high  in  all  areas  of  hie 
military  specialty  might  be  considered  to  be  more  valuable  to  his  service,  In 
the  series  of  successive  assignments  throughout  his  military  career,  than  would 
a  man  who  ia  very  high  in  some  areas  and  very  low  in  others.  One  of  the  letter 
men  might  work  out  well  in  one  assignment  and  be  a  complete  failure  in  others. 
Thus,  the  utility  a  soldier's  weighted  Intra-profile  variance  on  pertinant 
measures  of  his  abilities  seems  to  warrant  investigation.  In  the  physical 
and  biological  sciences  and  technologies  as  well  as  in  the  behavioral  sciences 
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TABLE  3 

CHARACTERISTICS  OF  SELECTED  GROUPS  OF  CADETS 
ENTERING  USMA  IN  1964 


Group 

N 

Academic 

Potential 

Score 

Academic 

Average 

Academic 

Achievement  SDP 
Index (AAI) 

SDPW 

M 

SD 

M 

SD _ 

M 

SD _ 

M 

SD 

M 

SD 

"Over-achievera" 
(Top  27Z  on  AAI) 

218 

605 

59 

2.559 

.113 

622 

48 

70 

33 

73 

40 

"Par-achievero" 
(Kiddle  46X) 

379 

600 

53 

2.413 

.107 

501 

35 

66 

27 

67 

31 

"Under-achievara" 
(Bottom  27X  on 

AAI) 

219 

601 

55 

2.272 

.113 

375 

50 

62 

24 

63 

29 

Total  1  year 

816 

602 

55 

2.415 

.152 

500 

100 

66 

28 

68 

33 

Total  with  gradaa 

8A3 

599 

56 

2.406 

.158 

500 

100 

66 

28 

67 

33 

and  technologies  an  intra-profile  variance  type  of  statistic  night  be  found 
useful.  For  example,  reasonably  accurate  estimation  of  the  probability 
of  failure  of  a  separate  component  or  unit  of  equipment  usually  is  possible. 
However,  considerable  difficulty  often  is  encountered  when  using  standard 
statistical  technlquas  to  estimate  the  composite  failure  probability  of  a 
complex  assembly  of  a  large  number  of  these  component  units.  An  exploratory 
approach  might  begin  by  comparing  the  distributions  of  the  weighted  intra- 
profile  variance,  of  each  component  unit's  significant  characteristics 
measured  under  standard  conditions,  for  component  units  at  different  levels 
on  the  best  available  reliability  statistic.  Where  adequately  detailed 
records  are  available,  data  on  the  past  success  and  failure  of  complex 
assemblies  might  be  compared  with  distributions  of  the  intra-profile  variance 
of  the  characteristics  of  all  of  its  component  units,  Including  intra-profile 
variance  of  the  component  units!  intra-profile  variances.  An  appropriate 
model  for  such  an  investigation  could  be  developed  by  quality  control 
researchers  for  a  apacific  type  of  equipment. 
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A  STATISTICAL  TEST  OF  TWO  HYPOTHETICAL  RELIABILITY 
GROWTH  CURVES  OF  THE  LOGISTIC  FORM  IN  THE  DISCRETE  CASE 

William  P.  Henke 
Research  Analysis  Corporation 
McLean,  Virginia 

ABSTRACT.  This  paper  demonstrates  a  mathematical  method  by  which  curves 
can  be  developed  useable  as  a  tool  for  aid  in  solving  the  problem  of  monitoring 
reliability  growth;  and  also  illustrates  how  statistical  tests  of  hypotheses 
may  be  conducted  in  conjunction  with  these  growth  curves. 

The  growth  curves  discussed  are  applicable  for  use  where  units  are  under¬ 
going  development  phases;  specifically  where  it  is  desired  to  periodically 
assess  the  actual  reliability  growth  of  these  units  for  comparison  with  hypothetical 
reliability  growth  curves. 

The  unique  facet  of  these  growth  curves  aa  presented  herein  lies  in  their 
use.  Since  their  application  is  directed  towards  the  improved  development  of 
a  unit  type,  this  development  is  dependent  upon  the  reliability  achieved  as  a 
result  of  improvements  made  on  previously  tested  units.  The  reliability,  or 
probability  of  success,  at  each  stage  of  development  is  independent  and  varies 
from  stage  to  stage. 

A  curve  embodying  the  assumptions  necessary  for  the  measurement  of  reliability 
growth  during  development  is  termed  the  Logistic  curve.  Two  such  curves  are 
plotted,  representing  two  alternative  hypothetical  growth  patterns  based  on 
specified  values  of  a  unit's  inherent  reliability.  From  the  observed  sample  of 
proportion  of  successes  (Relirbility)  accumulated  at  some  trial  of  the  development 
program,  a  selection  is  made  of  the  true  curve  of  the  unit  or  group  of  unite  that 
just  finished  the  test.  If  the  upper  growth  curve  is  actually  true  of  the 
population  from  which  the  sample  of  units  is  randomly  drawn,  a  small  risk,  a  , 
is  desired  that  the  sample  would  be  so  poor  as  to  bring  rejection  of  this  curve. 
Likewise  if  the  lower  growth  curve  is  true,  a  very  Bmall  risk,  R,  is  desired 
that  the  sample  will  be  so  good  as  to  bring  erroneous  acceptance  of  the  upper 
curve. 

The  subject  curves  have  been  found  useful  in  the  past  to  study  population 
growth,  learning  and  developmental  processes.  The  application  of  the  Loglatic 
Growth  Curve  concept  in  assessing  reliability  has  only  recently  been  directed 
toward  the  engineering  development  of  expensive  electronic  components.  Prior 
to  this  use,  extensive  literature  search  had  not  revealed  its  application  for 
this  purpose. 

The  concept  of  reliability  growth  during  the  development  stages  is  one 
which  should  be  emphasised  throughout  governmental  end  industrial  circles.  The 
growth  pattern  concept,  saving  time  and  money,  can  also  assist  in  creating  a 
better  understanding  between  the  consumer  and  the  producer  regarding  their 
mutual  problems,  through  the  joint  visual  monitoring  of  a  statistically  sound 
method  of  reliability  assessment. 


The  remainder  of  this  article  was  reproduced  photographically  from  the  author's 
copy. 
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Introduction 


Reliability  evaluation  is  as  essential  a  task  during  the  development 
of  a  unit  as  it  is  during  the  production  of  the  unit.  During  development, 
reliability  is  directly  affected  by  a  necessary  and  thorough  knowledge  of 
the  use  and  capabilities  of  the  proposed  unit.  Continuous  reliability 
design  analyses  and  engineering  changes  on  the  unit  cause  a  developmental 
growth  pattern  which  must  be  identified.  This  identification  is  necessary 
in  order  that  a  trend  can  be  predicted  and  the  reliability  requirement  can 
be  quantitatively  specified  for  use  in  the  evaluation  of  the  unit  during 
the  production.  This  developmental  growth  pattern  is  dependent  upon  the 
reliability  aohieved  as  a  result  of  improvements  mads  on  previously  tested 
units.  A  mathematical  function  which  has  been  found  useful  in  the  past  to 
describe  population  growth,  learning  and  developmental  processes,  and  more 
recently  to  fit  the  engineering  developmental  growth  pattern  of  mechanical 
and  electronic  components  is  the  3 -shaped  growth  function  presented  herein 
as  the  reliability  growth  model. 

During  the  reliability  development  phase,  the  first  unit  is  put  to 
test.  Its  performance  is  Judged  a  failure  or  success.  Subsequently,  the 
unit's  reliability  ia  assessed,  an  analysis  is  made  of  its  performance  and 
design  improvements  are  made.  These  improvements  are  built  into  another 
unit  (or  the  same  unit  if  no  damage  was  done  on  the  first  test)  which  then 
undergoes  the  same  process;  that  Is,  testing  for  failure  or  sucoesa.  Again 
Improve merits  are  made  and  the  cyole  is  repeated.  By  suoh  a  procedure,  it 
is  intended  that  reliability  will  grow  from  some  low  initial  value  (state- 
of-the-art)  to  a  higher  target  value  at  the  end  of  the  program. 


. 


The  Problem 

It  ie  not  unusual  that  although  the  inherent  reliability  of  a 
unit  Is  growing  properly,  the  "sample  values  of  tested  units"  may  vary 
enough  to  present  a  poor  reliability  picture,  "Sample  values  of  units" 
here  means  that  the  one  unit  tested  was  only  one  out  of  many  (of  a 
population)  that  could  have  been  tested.  Thus  even  if  reliability  is 
high,  a  rash  of  failures  in  a  sample  can  occur  and  cast  doubts  upon 
the  inherent  reliability.  Of  course,  it  can  also  happen  that  a  unit 
with  low  inherent  reliability  will  by  chance  produce  a  high  number  of 
successes  in  a  sample,  possibly  resulting  in  wrongful  acceptance  of 
the  unit  as  being  satisfactory.  It  is  against  these  possibilities  of 

i 

error  that  reliability  statisticians  direct  themselves  when  designing 
meaningful  test  programs. 

Two  reliability  growth  curves  are  plotted  on  Figure,  1,  representing 
two  alternative  specified  values  of  a  unit's  inherent  reliability,  '  From 
the  observed  sample  of  proportion  of  successes  (Reliability)  accumulated 
at  some  trial  of  the  development  program,  we  wish  to  choose  between 
which  curve  is  true  of  the  unit  or  group  of  units  that  just  finished 

s 

the  test.  If  the  upper  growth  curve  is  actually  true  of  the  population, 
from  which  the  sample  of  units  is  randomly  drawn,  we  only  want  a  small 
risk,  o,  that  the  sample  would  be  so  poor  as  to  bring  rejection  of 
this  curve.  Likewise  if  the  lower  growth  curve  is  true,  w<:  want  a 
very  amall  risk,  g,  that  the  sample  will  be  so  good  ac  to  bring  er¬ 
roneous  acceptance  of  the  upper  curve.  Theso  two  risks  aru  usually 
specified  by  the  experienced  engineer  or  manager  who  must  also  consider 
such  things  as  delivery  time,  coat,  availability  of  test  equipment  and 
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many  other  facets  which  contribute  towards  the  profit  picture  of  an 
industry. 

Mathematical  Derivations 
The  growth  Function 

8-ahaped  curves  of  growth  functions  have  been  arrived  at  by  many 
learned  people  as  cited  in  the  references  on  growth.  However,  Herbert 
K.  Weiss  fl]  in  a  reliability  sense  by  the  method  of  maximum  likelihood^ 
arrived  at  S-shaped  growth  curves  by  starting  with  the  assumptions  that 
each  failure  souroe  in  a  system  has  a  parameter  failure  rate  and  that  a 
constant  probability  exists  that  each  failure  source  will  be  properly 
discovered  and  corrected  by  way  of  development  engineering.  A.  Held 
[2],  arrived  at  the  same  form  of  the  curve  through  the  use  of  differential 
equations.  This  section  will  concern  itself  with  characterising  processes 
by  differential  equations  from  which  reliability  growth  will  be  derived. 

Let  x  denote  time  or  the  magnitude  of  a  growth  factor  which  in¬ 
fluences  the  size  of  y  of  the  observed  phenomenon.  Then  the  differ¬ 
ential  coefficient  dy/dx  denotes  the  rate  of  growth;  i.e.,  the  increase 
per  unit  of  time.  At  this  point,  the  growth  process  can  b‘e  characterized 
by; 


$ 


.tV 


s 

■i 


|f  ■  f(x,y), 

which  Indicates  the  growth  rate  depends  both  on  time  (x)  and  of  the  size 
obtained  (y).  We  shall  only  deal  with  special  cases  of  the  typei 

|f  -  *(y)g(x),  (1) 


which  may  be  written  eat 

?fy)  "  «<x>d*  , 

in  differential  notation. 


147 


'•  *<>»•.*■  i-J5: -Wtv  1  '•■-'■'■ 


ft 


I 


Integration  yields 

r(y)  -  a(x)  -(g) 

Thus  by  Beans  of  (£)  y  is  determined  as  a  function  of  x. 

% 

To  apply  (i)  to  a  specific  case,  we  take  the  situation  whereby  the 
growth  rate  la  proportional  to  the  achieved  reliability  R,  and  to  a 
function  of  time,  g(t)  as: 


as  “  **(*> 

rearranging  terms  and  using  differential  notation  we  obtain: 
it  ->  8(t)  -  > 

Since 


dR 


Sjj  ■  d  InR,  we  have  from  (U-) 

d  InR  d»  1 

Til 


(3) 

(4) 

(5) 


which  is  called  the  logarithmic  differential  coefficient  to  be  used 
further . 

Introducing  R  -  1,  R,  \  -  R,  and  R(\  -  K)  in .  (3) we  obtain  the 
following  four  differential  equations: 


where  \  denotes  the  maximum  value  of  R. 

The  four  equation*  respectively  Indicate  that  at  a  given  time,  the 
reliability  growth  rate  (1)  depends  on  time  but  is  independent  of  the 
else  reached,  (2)  is  proportional  to  the  else  reached  and  to  a  function 
>  of  the  time,  (3)  is  proportional  to  the  "remaining  sisej"  that  is,  the 


i 
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maximum  size  minus  the  size  reached,  and  a  func-Mon  r,r  the  time,  ar.d 
(4)  is  proportional  to  both  the  size  reached  and  the  remaining  size 
as  well  as  a  function  of  the  time. 

We  must  study  the  character,  or  form,  of  the  fourth  differentijal 
equation  from  equation  (6)  which  is: 

m  R(X“R)  g  (t),  (OsRsx),  x  -  constant, 

I 

which  is  set  up  under  the  important  assumptions  which  are  worthy  of 
repeating:  At  a  given  number  of  trials,  the  reliability  growth  rate, 
dR/dt  is  a  function  of: 

1)  the  number  of  trials,  t, 

2)  the  growth,  R,  reached  at  a  number  of  trials  t,  and 

3)  the  remaining  growth  (\-R)  to  the  maximum  possible  reUabll- 
ity  value  X. 

By  introduction  of  the  logarithmic  differential  coefficient  as  in 
equation  (5)  we  shall  derive  the  reliability  growth  function. 

Dividing  (7)  by  R  and  substituting  in (5)  ,  we  get: 

+  Rg(t)  -  Xg(t). 

e 

Solving  (7)  for.R: 

_  1  dR 

R  "  '(Xr8)"g(t)  at 

Substituting  (9)  into  (8) 

+  ‘(x- R'kvy ,  §  s(t) "  Xg(t)»  . 


(7) 


(8) 


(9) 


or 


InR  d  In  (X  -  R) 

ST - dF 


(io) 


149 


iH  nr» 


and 

g2(t)  -  Bo  +  Bj/t  +  B2/t2, 

one  can  obtain  a  number  of  examples  of  frequently  applied  growth  curves. 
For  the  reliability  growth  testa  of  the  discrete  case  presented  in  this 
thesis,  it  is  sufficient  to  assume  g(t)  ■  B.  Therefore,  equation  (11) 
can  be  written  as: 


m\f* 

where  A  ■  e  ;  or  is  a  constant  of  integration,  .  . 

Far  B  >  0,  equation  (12) is  an  increasing  function  of  t  having 
the  asymptote  Since  0  s  R  <  X  and  0  s’  \  <  1,  the  desired  S- shaped 
reliability  growth  function  is  obtained: 


(13) 


15C 


Figure  2  shows  the  growth  function  given  In  (13)  as  the  S-shaped  curve 
with  horizontal  aaympo totes  at  R  =  .0,  and  R  . 
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The  Critical  Region 

In  designing  statistical  tests  of  hypotheses,  it  Is  necessary  to 
specify  the  Bize  of  the  critical  or  rejection  region  a.  a  1b  defined  as 
the  Type  I  error  of  the  test  and  It  is  the  probability  that  the  null' 
hypothesis  will  be  rejected  when  it  is  actually  true.  The  procedure  of 
calculating  the  critical  region  when  applying  the  reliability  growth 
function  is  to  find  the  acceptance  number  of  successes  "a"  such  that 
a 


»  <  Z  R.  1 
i-0  1 


n 


where  R^  Q  is  the  reliability,  or  probability  of  getting  1  successes 
In  n  trials.  Thus  we  must  develop  a  probability  for  each  of  the  2n 
permutations  and  sum  these  to  some  minimum  acceptance  number  "a"  of 
successes,  Which  equals  or  just  exceeds  a.  The  acceptance  nuaber  of 
successes,  obtained  from  the  above  summation  of  probabilities  when 
divided  by  its  corresponding  n,  gives  the  proportion  successful  (a/n), 
which  when  plotted  on  the  same  graph  as  the  reliability  growth  curves, 
outlines  the  critical  region. 

The  probabilities  of  each  of  the  permutations  can  be  computed  by 
the  powerful  device  of  generating  functions  as  outlined  by  Uspenaky  [3] 
The  generating  function,  0  (g),  for  this  problem  is: 

0(O'-n  <8,1+0*  Z  Rj  g1* 

1-1  1  *  *1-1 

where  the  coefficients  will  give  both  the  permutations  and  the 
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Thcui  tciHio  rxic  uuw  individually  evaluated  and 


A  _  l  aa  n 
-  ^  - ‘V 

the  coefficients  of  the  dummy  variable  g*  are  summed  from  the  §°  term 
through  the  g  term.  When  the  summation  equals  or  just  exceeds  the 
^reassigned  a  value ,  the  exponent  of  g  is  divided  by  3\  This  pro¬ 
portion  is  then  plotted  to  outline  the  critical  region  for  n  ■  3. 

Testing  One  Orovrth  Curve  Against  an  Alternative 

The  mathematical  model  for  this  problem  is  based  on  the  assumption 
that  at  each  ith  (discrete)  trial  in  a  development  program  the  relia¬ 
bility,  R,£ ,  of  the  unit  is  given  by  the  growth  curve: 

R.  «  — — ~ — =7—,  where  A  end  B  are  constants. 

1  1  +  Ae"a 

Two  curves  ere  considered;  the  upper  (desired  growth)  and  the  lower 
(undesired  growth). 

The  upper  growth  curve,  our  hypothesis,  is  determined  by  the  initial 
current  state-of-the-art  reliability  which  is  given  to  be  R0,  the  de¬ 
sired  or  target  reliability  which  is  a  specified  value  ?N  at  program's 
end,  end  the  total  number  of  trials  for  the  entire  development  program, 

1  -  N. 

The  lower  growth  curve,  the  alternative,  is  determined  in  the  same 
manner  as  the  upper  curve,  with  R^  being  the  minimum  permissible  level 
and  R|j  being  the  minimum  target  level  at  i  ■  N  trials. 

A  single  unit  is  to  be  tested  at  each  1^  stage  of  the  development 
program,  registering  either  a  failure  or  success.  Even  if  the  upper 
specified  curve  is  a  true  characteristic  of  the  unlb,  the  random 
variations  of  sampling  units  will  prbduce  observed  proportions  of  sue- 


cesses  to  each  i  ‘  test  which  will  deviate  quite  widely  from  the  trend 
of  the  basic  growth  curve.  Each  "path"  or  "random  walk"  of  the  ob¬ 
served  proportion  successful  depends  on  the  permutation  of  successes, 

S,  and  failures,  F,  that  can  result  in  sampling  units  when  the  specified 
is  the  probability  of  success  and  Q^l.OO-R'^  the  probability  of 

It. 

failure  at  each  i  trial.  The  total  possible  random  walks  or  permu¬ 
tations  of  failures  and  successes  is  2^. 


Types  of  Error 

In  designing  statistical  tests  of  hypotheses,  it  is  necessary  to 
specify  the  size  of  the  critical  or  rejection  region  as  also  called 
the  producer's  risk.  Thus  we  must  develop  a  probability  for  each  of 
the  above  permutations  and  sum  these  to  some  minimum  acceptance  number, 
"a,"  of  successes  which  just  exceeds  probability  a.  When  V  is  divided 
by  its  corresponding  n,  the  proportions  obtained  outline  the  critical 
region . 

However,  if  some  lower,  undesirable  growth  curve,  which  does  not 
reach  target  R^>  is  actually  true  of  our  system  there  is  some  risk  or 
chance  0,  that  the  observed  proportion  will  not  fall  in  the  rejection 
region,  resulting  in  erroneous  acceptance  of  the  system. 

8ince  the  a  error  (Type  I)  is  predetermined,  we  will  show  the 
derivation  of  the  S  error  (Type  II ). 

1,  Usa  the  upper  growth  curve  R^'a  for  a  given  a  value  to  cal¬ 
culate  the  acceptance  number  of  successes  "a"  such  that: 
a 


a  t  E  R,  _ 
i-0  1»n 
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where  R,  is  the  total  probability  of  "i"  successes  in  n  trials  eon- 

i  j  n 

slating  all  possible  permutations. 

2.  Use  the  number  of  successes  "a"  obtained  from  the  first  step, 
applying  to  the  lower  growth  curve  to  calculate  g  values  such  that 

i-a  '  i 

The  calculation  can  be  conceptually  diagrammed  as  below  (although 
in  actuality  we  are  dealing  with  discrete  distributions): 


lower  growth  curve  upper  growth  curve 


Relationship  of  a  and  0  error  calculations 
Figure  3 

It  should  be  noted  that  the  errors  of  a  and  0  pertain  only  to 
each  value  of  n,  No  attempt  has  been  made  to  evaluate  the  overall  error 
for  the  decision  procedure,  namely,  we  do  not  know  what  is  the  probability 
of  accepting  or  rejecting  R  or  R',  independent  of  the  nuniber  of  items 
tested. 

Construction  of  Growth  Curves 
Calculation  Instruction:  For  upper  growth  curves 

1,  At  each  ith  (discrete)  trial  in  a  development  program  the 

j 

reliability  F^,  of  the  unit  is  given  by  the  growth  curve: 


1 
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1  l+Ae‘Bi 
The  Constanta  A  and  B 
For  A,  let  1  *  0,  wr 


can  be  obtained  In  tho  following  manner  i 
have 


(14) 


*o" 


-B0 


A  - 


-1 


(15) 


1  +  Ae 

For  B,  let  1  ■  N,  we  have 
1 


*N 


-BN 


1  +  Ae 

Substituting  the  value  of  A  in  (15)  into,  (16)  we  have 


(16) 


BN 


that  ie 


•BN 


"  iSj  *  «d 


1  R0<x-V 

B  •  '  N  l0ge  * 


2.  Assign  a  number  N  (-5,  10,  15,  . ..N)  to  i  in  formula  (l4)  to 
calculate  its  corresponding  which  will  make  up  the  body  of  the  table 
of  upper-growth-eurve-values. 

Similarly  we  can  obtain  the  values  of  lower  growth  curves. 
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' "  'irtSidiUAW'  ■'  <•  «'■'<•■*>■  >■■■■•  |‘ 

A’:  iiH  , 

Due  to  ever  incivunin",  prvKieti'-.or  and  freight  air  travel  and  resul.tunt  [ ■■ 

j  i 

need  for  more  rapid  'Jurn  around.  of  equipment,  a  contract  was  let  to  a  bond-  jjj 

ing  material  manufacturer  to  develop  a  considerably  more  effective,  but  more  ; 

■( 

expensive  metallic  broke  lining,  evaluate  it,  and  measure  its  effect  on  the 
braking  system  of  the  aircraft.  At  a  given  braking  horsepower  in  ft.  lbs/sec., 
the  criteria  for  determining  nuccoos  S  or  failure  F  of  the  material  aro  two-fold; 

1.  The  maximum  wear  of  tho  lining  is  not  to  exceed  inch^/,  and  j 

2.  The  maximum  wear  of  tho  bell  (brake  drum)  is  not  to  exceed 
—  inch  ^ 

Any  brake  lining  which  could  riot  meet  these  two  criteria  were  classified  ^ 

as  rejects  (failure),  Binca  these  criteria  are  considered  to  he  critical 
defects,  if  exceeded.  Mo  previously  tested  linings  can  be  retested. 

A  pre-design  meeting  was  hold  with  attendees  representing  management, 
the  customer,  engineering,  purchasing  and  reliability.  Since  the  reliability 
of  this  lining  was  of  prime  importance,  reliability  chaired'the  mooting. 

The  most  significant  points  made  in  the  meeting  were  that  the  cost  of  the 

l 

matallic  material  required  is  extremely  high,  and  the  required  reliability 
was  .993*  that  is,  on  the  average  the  customer  was  willing  to  live  with 
seven  lining  failures  in  1000.  It  was  also  mutually  agreed  that  a  six 
percent  probability  of  rejecting  the  material  was  allowed  when  the  sample 


Classified  information,  in  thousands  of  an  inch,  with  braking  applied 
for  x  hours. 
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shoved  poor  material  coming  from  a  good  lot.  This  is  known  as  the 
producer's  risk  (sfcrror).  The  initial  current  state-of-the-art 
reliability  of  the  brake  lining  was  given  to  be  20  percent.  The  un¬ 
desired  or  alternative  initial  reliability  was  given  to  be  19  percent, 
with  the  alternative  final  reliability  value  being  . 935 •  This  value 
of  .935  vas  chosen  since  a  review  of  they^ values  for  this  combination 
of  .935  and  .933  indicated  that  the  power  of  the  t«ist  (1,00  -y f) 
was  at  a  desirable  level,  considering  the  cost  of  the  material  and  the 
alpha  error.  The  beta  error  was  an  eight  percent  probability  of 
accepting  the  material  when  the  sample  showed  good  material  coming 
from  a  bad  lot. 

There  were  four  critical  environmental  tests  which  the  braking 
material  was  required  to  pass.  These  were: 

1.  Humidity 

2.  Temperature 

3.  Shock 

k.  Vibration 

A  success  in  one  particular  environment  does  not  mean  that  the 
specific  lining  would  have  passed  in  another  environment.  It  was 
decided  that  since  two  brake  linings  were  required  to  simulate  a  braking 
system,  the  linings  for  two  shoes  at  a  time  would  be  manufactured,  tested 
two  at  a  time  on  each  of  the  four  environmental  tests  and  their  re¬ 
liability  evaluated  and  growth  structure  monitored.  The  total  manufactured 
sample  size  due  to  cost,  was  allowed  to  be  72  pairs  of  shoes. 


159 


Die  test  results  up  to  and  including  test  number  40  are  shown  in 
Brittle  1  and  are  plotted  on  Figure  4  • 


Table  1  * 

BRAKE  BONDING  MATERIAL 
TEST  DATA  SHEET 


Number  Number 

of  Tests  of 

(Pair)  Successes 


Cumulative 
Number  of 
Successes 


Cumulative  Proportion 
of  Successes 


(Reliability) 


4 

8 


12 

lfi 

20 

24 

28 

32 

5$ 


1 
0 
3 
3 

2 
3 
2 
1 
2 
2 


1 

.25 

1 

.12 

4 

.33 

7 

.44 

9 

.45 

12 

•50 

14 

.50 

15 

.47 

17 

•M 

19 

.48 

Figure  4  illustrates  the  test  results  given  in  Table  1  plotted  in 
increments  of  four  pairs  of  linings,  the  upper  desired  reliability  growth 
curve,  Rjj,  the  alternative  lower  undesirable  reliability  growth  curve,  R^, 
and  the  critical  or  reject  region.  As  can  be  seen,  the  reliability  was  not 
growing  as  desired,  so  the  manufacturing  and  testing  were  halted  after  test 
number  40.  A  very  strict  analysis  was  ordered  of  the  design  before  authoriza¬ 
tion  was  given  to  proceed  further. 
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TWO  HYPOTHETICAL  RELIABILITY  CROWTH  CURVES 


Of  TESTS  (PAIRS 


Conclusions : 

The  concept  of  reliability  growth  during  the  development  stages  is  one 
vhich  should  be  emphasised  throughout  governmental  and  industrial  circles.  The 
growth  pattern,  saving  time  and  money,  can  also  create  a  better  understanding 
by  the  consumer  and  the  producer  of  their  problems  through  the  visual  monitor¬ 
ing  of  statistically  sound  methods  of  assessment.  The  producer  end  consumer 
should  get  together  before  development  to  understand  and  agree  on  the  following 
items  related  to  the  monitoring  of  the  to-be -developed  unit's  reliability: 

1.  The  current  state-of-the-art  reliability. 

2.  The  desired  final  reliability  value  at  program's  end. 

3*  The  alternative,  or  undeBired,  reliability  values  corresponding  to 

r 

steps  1  and  2. 

1*-.  The  inspection  size  and  final  inspection  size. 

5.  Null  hypothesis  and  alternative (6). 

6.  Alpha  error,  beta  error  and  power  of  the  test. 

Thus  a  thorough  knowledge  of  the  ability  and  use  of  the  subject  items  will 
be  overlapped  with  a  sound  statistical  technique  for  use  in  assessing  the  pro¬ 
posed  item  during  development. 

When  choosing  reliability  curves  of  the  type  presented  herein  for  use  in 
describing  the  growth  pattern  of  a  particular  item  in  development,  care  must 
be  taken  in  selecting  proper  combinations  of  sample  size  and  pre-assigned  alpha 
values.  Small  alpha  values  will  tend  to  be  equalled  or  exceeded  rather  quickly 
when  the  sample  size  is  quite  small,  say  10  or  loss.  For  larger  sample  sizes, 
the  values  of  alpha  are  not  as  quickly  equalled  or  exceeded ,  and  when  exceeded, 
the  cumulative  probabilities  closely  approximate  the  preassigned  alpha  values. 
Of  course,  the  power  of  the  test  (I. 00  -  Beta)  will  assist  the  choice  of  the 
proper  combinations  df  alpha  and  the  sample  size. 
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PROGRAMMING  THE  GROWTH  MODEL 


Introduction 

The  reliability  growth  model  program  was  written  in  Fortran  IV 
language  and  was  run  on  the  IBM  70k0  computer.  The  program  is  flexible 
in  the  sense  that  positive  or  negative  step  sizes  are  permitted  in 
choosing  sequences  of  upper  or  lower  curves.  It  is  also  possible  to 
skip  certain  curves  in  a  sequence  of  upper  or  lower  curves.  The  pro¬ 
gram  will  run  approximately  twenty  minutes  on  the  7^.0  for  60  combi¬ 
nations  of  upper  and  lower  reliability  growth  curves  and  50  different 
values  for  sample  size.  The  program  listing  is  included  for  use  by 
those  wanting  to  generate  curves,  critical  regions  and  0  errors. 

Please  note  the  program  statement  numbers  are  included  in  brackets 
to  the  right  of  the  appropriate  statements. 

Description 

The  reliability  growth  program  calculates  the  quantities  Probability^, 
i  ■  1,  ...  ,  n  outlining  the  critical  regions  corresponding  to  different 
values  of  the  Type  I  or  a  error,  a i  =  1,  ...  ,  n,  and  the  corre¬ 
sponding  Type  II  or  0  error  represented  by  the  quantities  S^,  i  =  1, 

...  n,  when  given  the  following: 

1,  various  inspection  sizes  of  i  components  ranging  from  0  to 
a  total  of  N  of  components, 

2,  a  reliability  R.  which  represents  the  expected  reliability 

1upper 
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level  for  the  inspection  size  of  i  components;  i.e.  the  observed  re¬ 
liability  R^  (ratio  of  number  of  accepted  components  to  the  accumulated 
total  nuriber  i  of  components  which  have  been  inspected  at  a  given 
stage  of  time)  is  not  to  fall  within  the  critical  region  determined  by 

the  value  of  the  Type  I  error,  a,  and  the  value  of  R.  and, 

1upper 

3.  a  reliability  R.  <  R.  which  represents  an  alterna- 

llower  xupper 

tive  reliability  level  for  inspection  sizes  of  i  components  such  that; 

if  the  observed  reliability  for  the  inspection  size  of  i  components 

is  less  than  R.  ,  we  wish  to  calculate  the  probability  3  of  corn¬ 
flower 

mitting  a  Type  II  or  $  error,  where  $  is  defined  as  the  probability 

that  the  observed  reliability  R^  does  not  fall  within  the  critical  region 

determined  by  R  and  the  specified  value  of  a,  but  in  actuality  the 

Sipper 

expected  reliability  at  the  given  state  is  given  by  R. 

flower 

The  quantities  Rrob^  <=  a^/i,  Probg  •  ag/i,  Prob^  °  a^/i,  Prob^  ■=  a^/l 

computed  for  the  four  choices  of  c^,  Og,  . »n  of  or  for  each  in¬ 
spection  size  i  represent  the  proportion  of  the  number  of  successes  a1 
or  reliable  components  to  the  number  i  of  components  which  have  been 
inspected  at  the  given  stage.  Therefore,  Prob^,  Probg,...,  Probn  at 
any  given  stage  of  inspection  of  lots  of  i  components  will  be  functions 

of  R.  and  of  <*,,  ag,  . ..,  an  respectively.  The  quantities  Beta^ 

lupper 

BetSg,  . Betan  represent  the  Type  II  or  3  errors  which  are  functions 
respectively  of  the  values  a^,  ag,  . afl  of  number  of  successes  com¬ 
puted  previously  in  the  determination  of  Prob^,  Probg,  . ..,  Prob^  and 

functions  also  of  R. 

flower 


1.  (where  a,,  ag,  a  are  the  different  values  of  "a"  corresponding 

to  the  four  choices  a",  ffg,  ...,  »n  of  o'.) 
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The  input  to  the  program  is  specified  by  various  combinations  of  Rq, 

RH’  R0’  aml  various  inspection  sizes  i  of  components  where, 

H0,  R^  are  initial  and  target  reliabilities  respectively  which  are 

used  to  compute  the  ordinate  points  R  on  the  upper  growth  curve 

1upper 

corresponding  to  the  abscissa  points  i  representing  inspection  size  and 

R^,  R^,  are  initial  and  target  reliabilities  respectively  which  are 

used  to  compute  the  ordinate  points  R,  on  the  lower  growth  curve 

1lower 

corresponding  to  the  abscissa  points  which  represent  inspection  size  i. 

The  initial  inspection  size,  the  step  between  inspection  sizes, 
and  the  largest  inspection  siz  or  total  N  of  components  may  be  varied 
without  altering  the  program.  Also  the  values  of  a  may  be  varied 
where  the  notation  convention  <  <*3  <  <*n  *s  to  be  observed.  In 

choosing  various  combinations  of  Rq,  R^,  R^,  R^  any  initial  values  of 
Rjj,  may  be  chosen,  a  step  for  simultaneously  increasing  Rjj,  may 
be  chosen,  and  the  number  of  steps  desired  may  be  chosen.  Similarly 
the  initial  values  for  RQ,  R£,  the  step  size,  and  the  number  of  Bteps 
desired  may  be  specified, 
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List  of  Symbols 


Fortran  Notation 


HRO 


HRN 


ROUI 


ROLI 


RNUI 

I 


RNLI 


Statistical  Notation  Description 


°initial 


R0 

initial  K0 


Step  size  between  succeeding 
values  of  RQ  (must  be  the  same 
as  step  size  between  succeeding 
values  of  Rq). 


Step  size  between  succeeding 
values  of  R.  (must  be  the  same 
as  step  size  between  succeeding 
values  of  R^). 


if  R.  is  the  smallest 
°initial 

value  of  RQ  which  is  used  in 
the  specified  combinations  of 
values  Rq,  R^,  Rq,  R^,  then 


ROUI  is  equivalent  to  R 


o  "''r  ' 
initial  K0 


if  Hi  is  the  smallest 

“initial 

value  of  Rq  which  is  used  in 


the  specified  combinations  of 
values  Rq,  Rjj,  Rq*  R^,  then 


ROLI  is  equivalent  to  R'  -h_  . 

“initial  0 


V 


initial 


if  is  the  smallest 

■“initial 

value  of  En  which  is  used  in  the 


specified  combinations  of  values 
R0,  Rjj,  Rq»  R£,  then  RNUI  is  equiv- 

alent  to 


initial 


‘V 


*k 


'initial 


\ 


if  Rl  is  the  smallest 

“initial 

value  of  R^  which  is  used  in  the 
specified  combinations  of  values 

V  V  R0*  %• then  mLr  18 

equivalent  to  R„  -hp  . 

"initial  r,N 
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Fortran  Notation  Statistical  Notation 


Description 


IRO 


IRN 


MAXTRI,  XMAX 

N 

HTRI 

hi 

INTRI 

1initial 

ITRI 

ALPHA  (1) 

a  1 

ALPHA  (2) 

a  2 

ALPHA  (3) 

a  3 

ALPHA  (n) 

or  n 

AU 

A 

AL 

A' 

BU 

B 

BL 

B* 

TRIALS,  NTRILS 

1 

total  number  of  given  values 
for  Rq  (must  be  the  same  as 

total  number  of  given  values 
for  »£). 

total  number  of  given  values 
for  (must  be  the  same  as 

total  number  of  given  values  for 

**>• 

largest  number  of  components 
(inspection  size)  considered 

step  size  between  succeeding 
inspection  lots . 

initial  is  the  or 

smallest  lot  which  is  to  be 
sampled. 

number  of  inspection  sizes  to 
be  sampled. 

specified  values  of  a  such 
that  <  o>2  <  <  £*n 


A  -  (l-*0)/R0 

a'*  (l-ny/fy 

Rn  (l-Rj 

B  -  (-1/N)  log  -S - 

\  o ) 

K  d-Ri) 

number  of  components  i  in 
inspection  size  being  con¬ 
sidered  at  given  stage  in 
sequential  sampling  procedure. 
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Fortran  Notation 

Statistical  Notation 

Description 

R1U 

Ri 

upper 

R.  -  1/(1  +  Ae‘Bl) 

upper 

RIL 

Ri 

lower 

R  -  1/(1  +  A*e"B’1) 

llower 

QIU 

«L  “  1  -  Ri 

OIL 

*1 

Q*  *=  1  -  R’ 

BETA  (1) 

Beta^ 

Type  II  or  p  errors  corre¬ 
sponding  to  Ojl,  Qfg,  a/^t  an 

BETA  (2) 

Beta2 

BETA  (3) 

Beta^ 

BETA  (n) 

Betan 

PROS  (1) 

Prob^  ■  a^/i 

Answers  printed  out  for  values 
on  of  a  denoting 

ratio  of  critical  region  to 
number  inspected. 

PROB  (2) 

Probg  c  a2/i 

PROB  (3) 

Probj  ■  a^/i 

PROB  (n) 

Prob„  -  a  /i 
n  n' 

,  i 
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Procedure 


The  procedure  is  as  follows : 

1.  Choose  a  particular  combination  of  values  RQ,  Rq,  Hjj* 

2.  Choose  a  particular  inspection  size  i  and  final  size  N. 

3.  Choose  a  particular  combination  of  a  values. 

4.  Find  the  numbers  a^,  a^,  a^»  an  for  6iven  values  of 

*1’  o2,  o3,  an. 

5.  Divide  the  a's  in  step  4  by  n  to  get  the  probabilities  which 
outline  the  critical  region. 

6.  Find  the  corresponding  0  values  for  each  combination  of 
o.  and  i. 
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Fortran  Pronram  HR+.-trw 
(Statement  Numbers  in  Brackets) 

DLOGIC, MAP, FILES 
GROWTH  FULIST, REF, DD 

DIMENSION  RUI(50l),RIL(50l).QlU(50l),QIL(50l),TERMU(50l).TERML(50 
1. ),  ALPHA(  4 ) ,  PR0B(  4 ) ,  BETA  ( 5 ) ,  TRIALS(  501 ) 

READ  ( 5 , 1001 ) ROUI, ROLI , RNU I , RNLI ,  KRO!J,  HROL,  KRNU,HRNL,  IRO,  IRN.MAXTR 
I,INTHI,HTRI,  (ALPHA(l), 1.1,4)  * 

FORMAT(8F5 .3,213,16, 15, F4.0/4F5, 3) 

INITIALIZE  RELIABILITY  VALUES 


ROU.ROUI 

ROL=ROLI 

DO  2001  11.1, IRO 

ROUcROU+HROU 

ROL=ROL+HROL 

CALCULATE  CONSTANTS  A  AND  B  FOR  USE  IN  SOLVING  FOR  R(l) 

AUr  f 1 . 0 -ROU ) /ROU 

AL=( 1 <  O-ROL ) /ROL 

RNUc=RNUI 

RNL.RNLI 

DO  2002  12.1, IRN 

RNU.RNU+HRNU 

RNL.RNL+HRNL 

PRINT  4015, ROU, ROL, RNU, RNL 

FORMAT  (lOx4HROU.,F5 .3, 1Gx4HR0L*,F5 .3, 10x4HRNU»,F5 .3, 10x4hRNL.,F5 • 
3//) 

XMAX=MAXTRI 

BU»-  ( -1 . 0/XMAX  )*ALOG!  (  ROU*(  1 . 0-RNU )  ) /(  RNU*  ( 1 . O-ROU ) ) ) 

,  BL= ( -1 . 0/XMAX  >*ALOO(  (  ROL*  ( 1 , 0  -RNL )  )  /  ( RNL*  ( 1 . O-ROL ) ) ) 

WRITE  (6,1002)R0U,RNU,R0L,RNL,(ALPHA(I),I=1,4),(ALPHA(J),J.1,4) 
FORMAT (l HI  53X24HCRITICAL  REGION  CURVES///35X13HUPPER  GROWTH  CU 
RVE,32X18HL0WER  GROWTH  CURVE/ / 3 0X4HR0  .,F6, 3, 8X4HRN  .,y6.3,22X4HRO 
.,F6.3,8X4HRN  -,F6.3//9X9HNUMBER  OF, 15X22HPR0BABILITT  OF  SUCCESS, 
29X19HTXPE  II(BETA)  BRROR/IIX6HTRIALS,  4(  2X6HALPHA-, F5 .3 ),  IX,  4( 2X6H 
ALPHA.,  F5 .3)) 

PTRIAL.INTRI 
DO  2003  I3»1,MAXTRI 
TRIALS(  13). 13 

CALCULATE  UPPER  AND  LOWER  RELIABILITY  VALUES,  R(l) 

RIU(l3 ).1.0/(l ,0+(AU*(2.7l822**(-BU*TRlALS(I3 ) ) ) )) 

RIL(  13 )  =1 . 0/(  1 . 0+(  AL*  ( 2 . 71822**  ( -BL*TRIALS(  13 ) ) ) ) ) 

CALCULATE  Q  VALUES,  Q(l) 

QIU( 13 ) .1 .0-BIU( 13 ) 

QIL(I3).1,0-RIL(13) 

PI3=I3 

IF(PZ3.NE.PTRIAL)  00  TO  2003 

PTRIAL.PTRIAL+HTRI 

SUMU.1,0 

SUMLal.O 

BETA(l).1.0 


tlO) 

[1001] 


[4-015  ] 


[1002] 
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SUM  PROBS  OF  ZEiiO  SUCCESSES 


SUMU=SUMU*QIU( I ) 

SUML-SUML*OJL(  I ) 

COWARE  SUM  OF  PROBS  OF  ZERO  SUCCESSES  WITH  ALPHA  VALUES 

IF(SUMU-AI,PHA(IALPHA)  )5025, 5020,  $020 

PROB( IALPHA ) =0 . 0 

BETA( IALPHA  *l) =BETA( IALPHA ) 

IALPHA=IALPI1A+1 

IF(  IALPHA -105010, 5010, 132 

CALCULATE  BETA  VALUES 

BETA ( IALPHA ) =BETA ( IALPHA ) -SUML 

DO  5040  1=1, 13 

TERMU(l)=SUMU 

terml(i)=suml 

K3-I3-1 

IF  (K3.EQ.0)  GO  TO  5120 
”0  5080  K=1,K3 

DO^OSO  1=1,13 

termu(i)=termu(i)*riu(j)/qiu(j) 

SUM  PROBS  OF  (ZERO  SUCCESSES  AND  MIDDLE  TERM  SUCCESSES) 

SUMU=SUMU+TERMU(I) 

TERML(  I )  =TERML(  I  )*RIL(,T  )/QIL(  J ) 

J=J+1 

IF(J.0T.I3)J*1  , 

CONTINUE 

COMPARE  SUM  OF  MIDDLE  TERM  PROBS  OF  SUCCESSES  PLUS  SUM  OF  ZERO 
SUCCESSES  (PROBS  OF)  WITH  ALPHA  VALUES 

IF( SUMU-ALPHA ( IALPHA) )50?5, 5070, 5070 
PKrK 

PROB( IALPHA ) =PK/TRIALS( 13 ) 

BETA ( IALPHA  +1 )  kBETA ( IALPHA ) 

IALPHA=IALPHA+1 
IF(ialpha-4)5o6o, 5060, 132 
DO  5078  L=1,I3 

CALCULATE  BETA  VALUES 

BETA ( IALPHA ) =BEEA ( IALPHA ) -TERML( L ) 

CONTINUE 
RTERMUxl , 0 
RTERMLul.O 

DO  5090  Ixl,I3  k 
RTERMUxRTERMU^RIU  ( 13 ) 


[5000] 


[5010] 

[5020] 

1 


[5025] 

[5030] 

[5040] 


[5050] 


[5060] 

[5070] 


[5075] 


[5078] 

[5080] 

[5120] 


[5090] 


SUM  PROBS  OF  ALL  SUCCESSES 


SUMU=SUMU+RTERMU 

COMPARE  SUM  OF  PROBS  OF  (ZERO,  MIDDLE  TERMS,  AND  ALL  SUCCESSES) 
WITH  ALPHA  VALUES 


IF(SUMU-ALPHA(IALPHA))3105, 5110, 5110 
PK=I3 

PROS? IALPHA ) =PK/TRIALS( 13 ) 

BETA( IALPHA +1 ) =BETA( IALPHA ) 

IALPHA =IALPHA+1 

IF( IALPHA -4 )5100, 5100, 132 

WRITE  (6,1003)TRIALS(I),(PROB(I),I=1.4),(BETA(J),J=1,4) 

FORMAT (  Fl6 .  0,  F12 . 5, 3*13 • 5 , IX, 4F13 .5) 

GO  TO  2003 

WRITE  (6,1004)TRIALS(I),IALPHA 

F0RMAT(/Fl6.0,5X,20HC0NDITI0NS  ON  ALPHA(,I1,38H)  NOT  SATISFIED  AFT’ 
ER  SUMMING  OF  TERMS) 


5100) 

15110J 


1 132] 
[1003) 


[3105] 

[1004] 


CHECK  TO  SEE  WHICH  ALPHA  VALUE  WAS  NOT  EXCEEDED 


IF( IALPHA -1 )2003, 2003, l4l 
IF(IALPHA-2)151,151,152 
WRITE  (6,1005)PR0B(l),BETA(l) 

FORMAT^ 1 6X, F12 . 5 , 40X, F13 . 5 ) 

00  TO  2003 

if(iaij>ha-3)  161,161,162 

WRITE  (6,1006)PR0B(l),PR0B(2),BETA(l),BETA(2) 

FORMAT ( l6X, F12 . 5, F13 . 5, 27X, 2F13 . 5 ) 

GO  TO  2003 

WRITE  (6, 1007)(PR0B(l)j 1=1,3), (ELTA(I),I»1,3) 

FORMAT ( l6X, F12 .5, 2F13 -5, l4X,  3F13 . 5 ) 

CONTINUE 
WRITE  (6,1009) 

FORMAT^ 1H1  50X25 ^RELIABILITY  GROWTH  CURVES///35XI8HUPPER  GROWTH  CU 
RVE,32Xl8KLOWER  GROWTH  CURVE//30X2HRIl6X2HQI30X2HRIl6X2HQI// 
9X9HNUMBER  OF  /11X6HTRIALS) 

WRITE  (6,1008)(TRIALS(I),  RIU(l),QIU(l),RIL(l),QIL(l),I=l,MAXm) 
F0RMAT(Fl6.0, 10X,F7 .5, 12X,57.5, 25X,F7 .5, 10X,FT  .5) 

CONTINUE 
CONTINUE 
GO  TO  10 
END 


[l4l] 

[151] 
[1005] 

[152] 
[161] 

[1006] 

[162] 

[1007] 

[2003] 

[1009] 


[1008] 

[2002] 

[2001] 
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Hand  Calculations 


n 


In  order  to  ascertain  the  validity  of  the  program  logic,  hand  cal¬ 
culations  were  performed  and  compared  with  the  computer  run  as  given 
in  Table  2  .  Due  to  the  high  alpha  values,  the  example  used  is  not  recom¬ 
mended  for  other  than  comparing  with  hand  calculations.  As  will  be  seen, 
the  hand  calculation  ends  at  n  =  3  due  to  cumbersome  calculations  for 
n  >  3. 

The  hand  calculations  proceed  as  follows: 

1.  Calculate  R1  and  Qi  values  for  the  upper  grovbh  curve. 

2.  I’or  a  particular  n,  calculate  probabilities  of  successes  from  zero 

successes  through  all  successes. 

3.  Compare  the  probabilities  calculated  in  step  two  above,  with 
the  preassigned  alpha  values. 

4.  If  the  probabilities  in  step  3  are  equal  to,  or  exceed  alpha, 
determine  the  number  of  successes  of  the  term  which  determined  if  the 
alpha  was  mot  or  exceeded,  and  divide  this  number  by  n. 
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COPY  OP  A  C 


The  actual  hand  calculations  follow: 


UPPER  GROWTH  CURVE 


R0  ■=  .25,  Hjj  =  .90,  N  =  10,  Rt 


1  +  Ae 


^bT 


A  ■  -TT-  '  l-°°  ■  3.0 


B  *»  - 

N  l0ge 

R0  (1.00  - 
R„  (1.00  - 

V 

-  .3297 

1 

1 

B 

Bi 

e'Bi 

A 

1  ♦  Ae“B1 

R1  _ 

0 

.3297 

0 

1 

3.0 

4.0 

.250 

.750 

1 

tl 

.3297 

.71913 

II 

3.15739 

.31672 

.68328 

2 

ft 

.6594 

.51716 

It 

2.55148 

.39193 

.60807 

3 

ff 

.9891 

.37191 

II 

2.11157 

.47358 

.52642 

LCWER  GROWTH  CUKV2 

*P  b 

2*rn- 

.8,  K  -  10, 

R  -  1 

Ho 

1  1  ♦  A.' 

A  ■  4 

- 

1 

B  -  .2773 

B  B1 

e~Bi 

A 

h 

0 

•2773 

0 

1 

4 

5 

.200 

.800 

1 

it 

.2773 

.75782  . 

II 

4.03128 

.24806 

.75194 

2 

t» 

.5545 

.57435 

II 

3.29740 

.30327 

.69673 

3 

11 

.8310 

.43526 

II 

2.74104 

.36483 

.63517 

Comparison  with 

Table  2 

shows  the 

hand 

calculated  values 

of  and 

^  to 

closely  approximate  the  values  printed  out  by  the  computer;  the  difference 
being  the  computer  retains  more  decimal  places  than  used  In  the  hand  cal¬ 
culations  . 
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The  calculation  of  th<?  "PrnhuMHtlei  of  Success"  es  outlined  in  the 
main  tody  of  Table 2  follows,  using  probabilities  from  the  upper  growth 


curves 


i 


Expansions 


n 

IT 

i*>l 


("i  «  *  0 


Alpha 


.60  .70  .80  .90 


1  5° 

.31672  +  .68328  2 

2  R^Rg  52  +  (RgC^  +  R^Og)  ^  ?° 

.124  +  (.268  +  .193)  +  .415  | 

3  ?3  +  [(HgRj^J+fR^QgJ+tR^gQg)]^2 

+  t(H3Q1Q2)+(R8QiQ3)+(FlQzQ3)]51 

*  Ws !° 

.059  +  C .1268  +  .0912  +  .0653] 

+  [.1968  +  ,i4io  +  .1014]  +  .2187  5 


l  1 

T  I 


1  2 

2  2 


2 

J 


2 

5 


It  must  be  remembered  that  the  summation  of  probabilities  begins  with  zero 
successes  to  the  nvuriber  of  successes  which  determines  that  the  alpha  of 
interest  has  been  stalled  or  exceeded. 

The  Beta  values  proceed  as  follows,  working  with  the  lower  growth 
curve  values  of  and  beginning  with  the  "a"  value  detersained  from  the 
probabilities. of  success  calculations s 


i, 

1 


n 

E  R 

i-a 


i,n 


5 


8ua  terms  from  0  successes  to  1 
Bum  terms  flfcm  1  success  to  1 


Alpha 

.60  .70  .80  ,90 

1.000 

.24805  .24805  .24805 
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1  JSO  .J[0  JJo  ^gO 

2  Sura  terms  from  1.  success  to  2 

(*A  +  RlQ2)  ?1  +  R1R2  $2 

(.228  +  .173)  C1  +  .075  52  .476  .476  .476 

Sura  terms  from  2  successes  to  2  .075 

3  Sum  terms  from  1  success  to  3 

[(r3q1q2)+(pc?^)+(r;lq2q3)]  I1 

+C(R2R3Q1)+(R-LR3Q2)+(R;LR2Q3)3  52+R1R2R3§3 

(.191  +  .145  +  .110]  51  + 

[.083  +  .063  +  .048]  g2  +  .027  g3  ■>  .667 

Sura  terms  from  2  successes  to  3 

[(RgR^MR^^+C^Rg^))  g2  +  R^gRj  g3 

t.083  +  .063  +  .048]  g2  +  .027  S3  -  *821  .821  *821 

Thus  the  logic  of  the  computer  program  listing  is  proven  to  be  valid 
since  the  hand  calculations  agree  with  the  results  rhown  in  Table  .2  .  and 
Table  2 .is  an  exact  copy  of  a  computer  run. 
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ON  FITTING  OF  THE  WEIBULL  DISTRIBUTION  WITH 
NON-ZERO  LOCATION  PARAMETER  AND  SOME  APPLICATIONS 


Oskar  M.  Essenvanger 
Aerophysics  Branch 
Physical  Sciences  Laboratory 
Research  and  Development  Directorate 
U.S.  Army  Missile  Command 
Redstone  Arsenal,  Alabama 


ABSTRACT.  The  Weibull  distribution  is  difficult  to  fit  when  the 
location  parameter  is  different  from  zero. 

Although  for  engineering  problems  a  graphical  method  for  determination 
of  the  parameters  exist,  an  application  to  numerous  data  samples  is  very 
time  consuming  and  elaborate,  moreover  when  the  location  parameter  is 
different  from  zero. 


Two  methods  are  presented,  applicable  to  computer  usage.  One  method 
is  based  upon  the  moments  of  the  distribution  and  the  second  upon  a  curve 
fitting  procedure.  Although  neither  method  utilizes  the  maximum  likelihood 
principle,  application  in  practical  engineering  problems  may  be  quite 
adequate. 

Examples  of  application  are  given,  and  the  analytical  curves  from  the 
two  methods  are  compared  with  observed  distributions.  Emphasis  Is  placed 
on  close  approximation  of  the  90,  95  and  99Z  value  of  wind  speed  and  wind 
shear  distributions. 

I.  INTRODUCTION.  The  Weibull  distribution  (1)  has  become  very  popular 
for  many  statistical  problems  in  recent  times.  This  is  understandable  if 
one  considers  that  this  distribution  offers  several  conveniences. 


The  distribution  form 


F(x) 


1 


-  e 


u) 


shows  3  parameters,  B  determining  the  shape,  9  defining  the  scale  and  y 
establishing  the  location  of  reference.  The  popularity  of  this  distribution 
is  based  upon  a  number  of  attractive  features.  The  distribution  is  versatile 
and  can  assume  various  types  of  other  distributions.  The  application  does 
not  necessarily  require  a  specific  statistical  model,  although  in  life  testing 
a  typical  case  of  utilization  arises.  It  is  a  cumulative  distribution,  where 
the  threshold  can  be  readily  computed  directly  rather  than  by  an  elaborate 
process  of  integration  as  in  most  other  types  of  distributions.  Its  three 
parameters  make  it  more  adaptable  to  many  empirical  frequency  distributions 
in  comparison  with  two  parameter  fittings.  Difficulties  arise,  however,  if 
all  3  parameters  must  be  determined. 

Usually  it  is  assumed  that  y  "0  and  then  no  problems  exist  for 
adequate  fitting  of  the  distribution.  Maximum  likelihood  (2,  3}  or  other 
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/ 


] 


methods  (A)  are  readily  available.  Limitation  to  a  2  parameter  fit  restricts 
the  utilisation  of  the  distribution  and  does  not  render  its  full  capacity. 
Curve  fit-Mnj  in  rather  difficult,  however,  if  y  4  0.  Two  methods  are 
therefore  presented  in  the  following,  by  which  the  y  can  dstirmincd  in 
objective  ways,  although  the  methods  are  not  based  upon  the  maximum  likelihood 
principle.  For  many  engineering  applications,  however,  the  two  methods,  which 
can  also  be  adapted  for  computer  use,  may  be  quite  satisfactory. 

One  method  is  derived  for  the  moments  fit  and  does  not  need  the 
frequency  distribution.  The  second  metho-l  requires  a  frequency  distribution, 
although  not  equal  class  intervals,  as  usually  assumed  by  maximum  likelihood 
methoda.  This  aacond  method  la  based  upon  a  curve  fitting  procedure. 

II.  THE  MOMENTS  FIT.  A  momenta  fit  of  a  distribution  is  in  most  cases 
very  convenient.  The  moments  of  a  distribution  can  be  easily  computed,  and 
it  is  not  necessary  that  the  total  frequency  distribution  is  known  for  a 
moments  fit.  Usually  an  analytical  solution  for  the  parameter  computation 
can  be  derived.  Unfortunately  this  form  of  explicit  solution  for  the  Weibull 
distribution  with  y  +  0  is  not  trivial,  as  the  6  in  the  moments  fit  appears 
implicit  in  the  r(n).  One  finds  for  tha  Waibull  distribution 


where 


(X)  -  x  -  9 

•  a  +  y 

C2) 

2  -  02  (b  - 

a2) 

(3) 

83  (c  -  3ab 

+  2a3) 

(4) 

a  ■  r  (1 

+  i> 

(5) 

b  -  r  <1 

(6) 

O 

1 

—3 

H* 

♦* 

(7) 

and  denotes  ths  third  moment  with  reference  of  the  mean, 
the  equation 


A 


1 


C  ; 

(b 


3.b  +  2«3 

Vp7T- 


This  leads  to 

(8) 


In  equation  (8)  the  8  le  the  only  unknown,  although  it  appears  in 
Implicit  form.  Teblsa  for  determining  8  can  be  found  in  a  recent  report  by 
tha  author  (5).  After  the  8  has  been  obtained, 


e 


2 


(9) 
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Tables  for  the  denominator  with  reference  to  6  are  given  in  the  above 
mentioned  report  (5) .  Finally 

Y  ■  x  -  0-  a  (10) 

The  respective  numerical  value  of  "a"  haa  also  been  included  In  above 
referenced  tables  (5) . 

Thus  the  moments  method  is  relatively  simple.  Equation  (8)  can  also 
be  adapted  for  solutions  by  high  speed  electronic  computers  with  subsequent 
calculations  of  9  and  y. 

The  moments  fit  may  have  practical  value  in  engineering  application. 

III.  THE  "STRAIGHT  LINE11  FIT.  Reservations  against  the  moments  fit 
are  largely  based  upon  two  objections.  First ,  the  moments  fit  Is  not  always 
a  maximum  likelihood  fit,  which  is  the  modern  trend  In  statistics.  Those  who 
oppose  the  moments  fit  for  that  reason  will  not  use  this  type  of  solution, 
although  utilization  may  provide  similar  results  for  practical  purposes. 

Therefore  no  further  discussion  of  thiB  argument  la  necessary  here.  The 
second  objection  la  based  upon  the  fact  that  3  pieces  of  Information  from  tha 
data  is  employed  only,  while  more  information  may  be  available.  Thla  la  true 
especially  when  the  frequency  distribution  is  given  or  known. 

Thus  the  engineers  sometimes  prefer  graphical  methods  as  demonstrated 
e.g.  by  Plait  (6)  or  Berrattonl  (7).  The  graphical  method  as  introduced 
by  Berrattonl  (7)  attracts  because  of  Its  simplicity  for  2  parameters,  whan 
y  -  0.  If  y  1*  0,  then  the  distribution  becomes  a  curved  line  in  log/log 
paper  Instead  of  an  easily  determined  straight  line  (sae  Figure  at  tha  and 
of  thia  article).  As  Berrattonl  suggests,  one  must  determine  y  by  trial. 

With  y  known,  the  Welbull  distribution  appears  as  a  straight  line  in  log/log 
paper,  and  0  and  0  can  be  obtained  readily.  The  cumbersome  procedure  is  to 
determine  y  by  this  graphical  method  and  make  a  judgment  when  the  transformed 
curve  la  considered  a  straight  line. 

By  this  method  y  can  only  be  determined  to  a  certain  degree  of  accuracy,  and 
arguments  about  differences  between  moments  end  maximum  llkalihood  fit  become 
the  more  irrelevant.  The  ides  behind  Berrettoni's  method  i»  certainly  to 
employ  more  information  on  tha  distribution  than  given  by  the  moments.  It 
must  therefore  be  possible  to  derive  an  objective  way  of  trial  to  determine 
Y  and  at  the  same  time  bring  the  inaccuracy  under  a  certain  limit,  which  can 
be  arbitrarily  selected. 

In  order  to  derive  the  equation*  for  this  procedure,  stert  with  the 
transformed  equation  (1)  with  F  ■  1  -  F(x).  Than 

in  [In  ~]  -  Y  -  6  In  (x-y)  -  8  In  0  (11) 

The  goal  is  a  straight  line.  Thus 

Y  "  V  "  *o  d*> 
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and 


z  -  In  (x-y) 


(13) 


At 


■  A  t-Kow  «  —  0** 


,  2  ,  n 

a.z  -  a  +  a„z  +  ...  a  z 
1  o  2  n 


Y  ^  0  produces  a  curved  line  (see  Figure  at  end  of  article)  and 

y  1 

or  in  orthogonal  functions 

*i 


Ao  +  Ai  ♦11  +  A2  *21  +  •••  An  *nl 


(14) 


(15) 


Then  the  Y  ■+■  y,  if  Aj  -*■  0  for  j  >.  2.  To  meet  this  condition,  a  teat  is 

necessary  for  A2  only  since  all  higher  order  coefficients  must  be  zero  at 

the  sane  time.  Otherwise  one  would  not  find  a  straight  line.  One  can  there¬ 
fore  restrict  the  computations  to 


S' V  Kht 
*  )  1 


(16) 


where  z^  in  this  orthogonallzed  system  at  equidistant  Intervals  corresponds 
to 

z^  -  in  (xA  -  y)  •  in  z'^  (17) 

More  details  can  be  found  in  a  separate  report  by  the  author  (S),  where 
examples  for  the  solution  are  given. 

IV.  COMPdRISON  OF  METHODS.  Before  applications  are  presented  it  may  be 
adequate  to  discuss  some  technical  details  and  limitations  of  the  two  methods. 

It  has  been  previously  stated  that  both  methods  are  not  derived  from 
Che  maximum  likelihood  principle  and  may  therefore  be  of  no  Interest  to  the 
theoretical  stetlstlcian  or  may  be  considered  as  substitute  methods.  The 
moments  fit  attracts  as  being  straight  forward  with  a  relatively  simple  way 
of  computing  the  parameters.  Only  three  moments  need  to  be  known.  From  the 
engineering  point  of  view  the  "straight  line  method"  comes  closer  to  a 
graphical  type  of  solution  and  renders  the  better  curve  fitting.  There  is 
no  necessity  of  the  frequency  distribution  being  given  in  equel  class  intervals. 
This  is  quite  convenient,  but  the  frequency  distribution  is  required  in  contrast 
to  the  moments  fit. 

One  limitation  can  be  found,  however,  in  the  exclusion  of  the  F^  j  -  0. 

This  leads  to  in  (in  1.)  xn  equation  (11),  which  being  infinity  must  be 
eliminated.  The  question  arises  therefore,  how  close  to  F^x  j  can  one  go, 
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or  should  the  first  x  be  omitted.  In  Table  I  a  survey  is  given  for  3  da*-* 
ccmplcs,  wicit;  u'ue  Werbuii  distribution  has  been  established  as  being 
appropriate.  The  first  column  in  Table  I  lists  the  parameters,  which 
Berrettoni  (7)  has  derived  by  his  graphical  method.  The  second  column 
represents  the  moments  solution.  The  subsequent  columns  reflect  the  para¬ 
meters  for  the  straight  line  method  under  various  conditions. 

It  is  self-explanatory  that  in  the  column  "without  F.  the  origin 

point  has  been  omitted.  The  other  columns  show  how  the  parameters  change, 
if  F^x  j  ■  0,001;  0.0005  or  0.0001.  It  can  be  concluded  that  the  y  responds 

somewhat  to  the  change  of  the  origin  and  with  y  the  other  parameters  will 
vary  (see  especially  case  3).  It  can  be  noted,  too,  that  the  solutions 
without  the  F^  y  agree  well  with  Berrettoni's  results.  The  small  differences 

can  easily  be  explained  by  inaccuracies  between  graphical  and  computational 
methods.  Under  the  aspect  that  the  graphical  solution  and  tne  straight  line 
method  are  not  maximum  likelihood  solutions  these  small  differences  become 
even  more  insignificant. 

In  order  to  test  the  differences  of  the  methods  for  significance,  one 
can  apply  the  Kolmogorov-Smirnov  Test  (8) ,  None  of  the  deviations  proved 
to  be  statistically  significant.  More  details  can  be  found  in  a  forthcoming 
article  by  the  author  (9). 

Since  case  3  displayed  the  largest  differences,  the  cumulative  distribution 
was  computed  for  various  postulations  and  is  summarized  in  Table  II.  The  first 
column  (after  the  variable  x)  contains  the  observed  distribution.  Berrattonl'a 
solution  (7)  follows  next.  Subsequently  the  computed  frequency  for  the  "moments" 
and  the  "straight  line"  method  are  listed.  The  underlining  of  numbers  Indicates 
the  maximum  deviation  for  all  presented  curves  in  that  particular  line.  This 
example  Is  quite  typical.  Although  none  of  the  differences  to  the  observed 
value  reaches  statistical  significance  at  the  95%  level,  it  can  be  seen  that 
the  3  methods  approximate  the  observed  distribution  in  specific  ways.  The 
moments  method  reveals  closer  fitting  towards  the  maximum  values,  while  the 
straight  line  procedure  deviates  less  at  the  minimum  values.  The  graphical 
solution  (Berrettoni)  provides  the  maximum  deviation  in  the  center. 


The  other  6  columns,  experimenting  with  varying  x^  and  the  related 

cumulative  frequency  as  outlined  in  the  heading  lie  somewhat  in  between  except 
for  x^  -  1.1  with  F^x  y  •*  0.001.  This  condition  exhibits  the  largest 

deviation.  It  proves  that  the  frequency  of  F^  y  should  be  kept  as  close 

to  zero  as  possible,  although  no  specific  value  can  be  established. 


V.  APPLICATION  TO  WIND  SPEED  AND  WIND  SHEAR  DATA.  In  an  earlier  article 
the  author  has  introduced  the  negative  binomial  distribution  which  functions 
quite  satisfactorily  for  frequency  distributions  of  wind  speed  (10) .  Cumulative 
threshold  values,  however,  are  very  cumbersome  to  compute  for  the  negative 
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binomial  frequency  diatribuCion.  The  question  was  thus  raised  whether  the 
Weibull  distribution  may  be  an  adequate  replacement.  Cumulative  thresholds 
can  be  easily  obtained  from  the  Weibull  distribution. 

Table  III  displays  typical  results  of  fitting  the  Weibull  distribution 
to  wind  data.  It  can  again  be  recognized  that  the  moments  fit  approximates 
closer  the  maximum  wind  speeds,  while  the  straight  line  method  adjusts  better 
to  lower  wind  speeds.  In  general,  the  Kolmogorov-Smlrnov  test  shows 
statistically  significant  differences  between  observed  and  analytical  values 
computed  from  the  Weibull  distribution  (see  details  in  11).  This  proven  that 
the  Weibull  distribution  is  not  the  beet  suitable  form  to  fit  wind  speed  or 
shear  data.  A  limited  application,  however,  turned  out  to  be  quite  valuable. 

The  engineer  is  often  faced  with  the  problem  to  determine  90  -  991 
values  when  no  detailed  distribution  is  given.  Since  the  moments  fit  of 
the  Weibull  distribution  has  given  good  results  for  the  maximum  wind  speeds, 
an  attempt  was  made  to  analytically  determine  the  90,  95  and  99Z  wind  speed 
and  wind  shear  value  and  compare  it  with  the  observed.  The  results  arc 
presented  in  Tables  IV  thru  VII. 

In  Table  IV  three  methods  are  compared  for  computing  90,  95  and  99% 
thresholds  for  wind  speed  and  wind  shear  values.  Montgomery  was  selected, 
as  it  illustrates  typical  results.  The  three  threshold  values  were 
analytically  computed,  employing  the  negative  binomial,  bivariate  and  Weibull 
distribution  (moments  fit).  Analytical  distributions  for  negative  binomial 
and  bivariate  distribution  are  described  in  detail  in  a  recent  report  by  the 
author  (10) . 

The  thresholds  were  computed  at  1  km  altitude  Intervals  up  to  31  km 
for  all  months.  The  (linear)  correlation  between  observed  and  analytical 
value  was  thus  computed,  as  exhibited  in  the  top  part  of  Table  IV.  This 
gives  evidence  that  the  Weibull  distribution  is  equivalent  to  the  negative 
binomial  except  for  the  wind  shear  and  99%  threshold.  The  Weibull  distribution 
is  even  better  than  the  bivariate  distribution,  which  is  generally  agreed  to 
be  the  proper  distribution  form  for  wind  speed  end  shear. 

The  central  part  of  Table  IV  lists  the  mean  of  A,  the  difference  between 
analytical  and  observed  wind  speed  or  shear.  Although  the  observed  values  for 
th«  wind  speed  appear  to  be  systematically  higher  than  the  analytical  values, 
the  bias  is  smallest  for  the  Weibull  distribution.  No  bias  is  exposed  for 
the  wind  shear. 

The  bottom  pert  of  Table  IV  deals  with  the  standard  deviation  of  the 
difference  A.  Again,  the  results  are  very  favorable  for  the  Weibull  distribution 
except  for  the  99%  wind  shear  estimate. 

Since  correlation,  mean  values  or  standard  deviations  can  sometimes  be 
misleading,  the  frequency  distributions  of  the  A  are  presented  in  Tables  V, 

VI,  and  VII..  This  also  gives  a  survey  on  the  maximum  deviations  to  be  expected 
for  the  various  analytical  approaches.  Table  V  contains  the  frequency 
distribution  of  A  for  the  90%  threshold,  where  the  Weibull  distribution  looks 
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very  good.  Less  than  5 %  of  the  data  avraoH  *  2  m/ccc  for  the  wind  and  i  1 
m/aec  per  km  for  the  wind  shear.  This  is  entirely  in  the  range  of  measurement 
accuracies . 

The  differences  are  higher  for  the  95%  threshold  of  the  wind  speed,  but 
still  under  10%  of  the  data  fell  outside  the  above  cited  range.  The  amount  is 
far  higher  for  the  negative  binomial  or  the  bivariate  distribution.  The  wind 
shear  differences  are  equivalent  for  all  three  types  of  analytical  forms  in 
the  95%  thresholds. 

Finally  the  frequency  distribution  of  the  differences  A  for  the  99% 
threshold  is  given.  Although  the  range  is  extended  compared  with  the  previous 
thresholds,  the  Weibull  distribution  displays  still  the  smallest  scatter  of 
all  three  methods  for  the  wind  and  could  be  considered  equivalent  to  the 
negative  binomial  for  the  wind  shear.  This  may  be  proof  enough  that  the 
Weibull  distribution  could  be  adequately  used  for  practical  purpose  in  the 
analytical  approximation  of  90  to  99%  thresholds. 

VI.  CONCLUSIONS.  Two  methods  for  fitting  the  Weibull  distribution  with 
non-aero  location  parameter  have  been  discussed.  One  method,  based  upon  the 
first  3  moments  of  the  distribution,  provides  a  simple  way  of  obtaining  the 
basic  input  for  determining  the  parameters  of  the  Weibull  distribution, 
although  the  solution  necessitates  a  computer  or  table  as  derived  by  the 
author  (5) . 

A  further  method  is  based  upon  a  curve  fitting  procedure.  The  property 
of  the  Weibull  distribution  to  delineate  a  straight  line  in  log/log  scale 
for  known  location  parameters  is  the  fundamental  principle  employed  in  solving 
for  the  parameters.  The  latter  method  requires  that  the  frequency  distribution 
is  known,  although  not  at  equidistant  intervals.  In  turn,  more  Information 
(namely  all  known  frequency  points)  is  utilized  by  this  curve  fitting  procedure 
in  contrast  to  the  3  moments  only  for  the  moments  fit. 

Both  methods  are  suitable  for  determining  the  Weibull  parameters  without 
electronic  computers,  while  the  iterative  procedure  for  a  maximum  likelihood 
solution  cannot  be  processed  without  computer  help.  The  moments  fit,  however, 
needs  the  tables  derived  by  the  author  (5).  if  no  computer  Is  available.  Thus 
both  methods  may  prove  beneficial  to  the  engineer  for  quick  solution  in  limited 
number  of  samples,  although  it  is  not  restricted  to  a  small  number. 

It  has  been  shown  that  the  moments  fit  in  most  cases  represents  the 
better  fit  towards  the  end  of  the  maximum  values,  while  the  curve  fitting 
procedure  puts  more  weight  on  the  proper  approximation  of  the  minimum  values. 
This  could  be.  changed,  however,  by  weighting  the  frequency  points  of  the 
distribution  for  the  curve  fitting  method. 

Finally  an  application  of  the  Weibull  distribution  for  wind  and  wind 
shear  data  is  shown.  Although  the  Weibull  distribution  has  limited  application 
for  wind  and  wind  shear,  the  moments  fit  proved  to  be  satisfactory  to  represent 
90  to  99%  thresholds. 
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TABLE  X 


Comparison  of  ?«rwt«s  Estimation  for 
ths  Volbull  DisCributlon 
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a 
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TABLE  XI 

of  Weibull  Distribution  for  Various  Method* 
•  for  Doc*  of  Berrstcoai'e  Table  xix  (cm  in  j>) 

8Cr*lght  Lino  MOthod 


X 

X 

•  1.1 

X 

a 

-  2.0 

X 

Obe 

Berr. 

Mon 
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.001 
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.001  x 
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- 

- 

- 
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.01 
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- 
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u 

2.0 

■ 
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6.49 
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TABU  III 


Hind  Spend  (a/aec)  Comparison  of 


CFD#) 

Obaerved 

Momenta  fit 

.0001 

3.0 

11.9 

.0100 

20.4 

19.4 

.0228 

21.7 

22.4 

.0500 

23.6 

26.1 

.1000 

31.6 

30.3 

.1590 

35.3 

33.9 

.9000 

47.8 

^7.5 

.8410 

61.9 

61.5 

.9000 

65.8 

65.4 

.9500 

69.8 

70.3  ' 

•9772 

76.9 

75-0 
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80.6 
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91.0 
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' CFD  ■  cumulative  frequency  distribution 
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Dlatrlbution 


Straight  Line  Method 
3.4 
17.9 

22.3 

27.3 
32.6 

36.7 

50.7 

63.2 

66.5 

70.5 

74.3 

77.6 

86.6 


TABLE  IV 


Montgomery  (June  1956  -  M*y  1964) 

(All  Months  Combined} 

Average  Correlation  Between  Observed  and  Analytical  Thresholds 
Wind  Wind  Shear 


Neg.  Bln, 

Biv. 

Wei. 

Neg.  Bin. 

Biv. 

Wei. 

90* 

.982 

•  996 

•  996 

.977 

.986 

.991 

95* 

.964 

•  995 

.997 

.968 

•  977 

.976 

99* 

.965 

.985 

•  99^ 

.940 

.904 

.914 

Mean  of  A 

90* 

-  .50 

1.36 

-  .44 

.16 

•  93 

.05 

95* 

-  .72 

.46 

-  .66 

.03 

.08 

.10 

99* 

-1.14 

-2.09 

-1.01 

-  .63 

-2-59 

-.02 

Standard  Deviation 

of  A 

90* 

1.51 

1.57 

.80 

.53 

•  50 

.33 

95* 

1.65 

1.93 

1.07 

.73 

.71 

.65 

99* 

2.20 

3.08 

1.80 

1.37 

2.07 

1.60 

A  ■  analytical  -  observed 
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TABLE  V 


Montgomery  (June  1956  -  May  1964) 
90# 

Frequency  Distribution  of  A 


. 

Wind 

Wind 

Sheer 

Meg.  Bin. 

Blv. 

Wei. 

Meg.  Bln. 
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<  -10 

1 
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0 
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0 
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1 

1 
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2 
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6 

1 

“3*99  to  -3.0 

6 

3 
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14 

2 
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54 
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'1 
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TABLE  VI 


Montgomery  (June  1956  -  May  1964) 
95* 

Frequency  Distribution  of  A 


Wind 

Hind 

Shear 

Meg,  Bln. 

liv. 

Wei. 

Meg.  Bln. 

Blv. 

<  -10 

-9.99  to  -9.0 

1 

1 

-8.99  to  -8.0 

1 

-7.99  to  -7.0 

2 

1 

-6.99  to  -6.0 

2 

3 

1 
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4 

1 

2 
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6 

2 

1 
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12 

5 

7 
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25 
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TABLE  VII 

Montgomery  (Juno  1956  -  May  1964) 
99 i 


Frequency  Distribution  of  A 


Meg.  Bln. 

1 

1 

3 

6 

0 

8 

10 

12 

59 

90 

100 

43 

19 

13 

3 

1 

1 

0 

0 

1 

0 


372 


Wind 


Biv. 
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Wind  Shoar 
Nog.  Bin.  Biv. 
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A  ■  analytical  -  obaorvod 

claaa  Interval  in  m/aoc  for  wind  and  m/aoc  per  1  km  for  wind  shear 
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Professor  William  G.  Cochran  of  Harvard  University  received  the  1967 
Samuel  S .  Wilka  Memorial  Medal  during  the  13th  Annual  Conference  on  the  Design 
of  Experiments  in  Army  Research,  Development  and  Testing,  which  was  held  at 
Fort  Belvoir,  Virginia,  1-3  November  1967.  Professor  Cochran  has  long  been 
recognized  as  an  International  authority  for  his  outstanding  contributions 
to  experimental  statistics,  mathematical  statistics,  the  design  and  analysis 
of  scientific  experiments,  teaching  activities,  stimulation  of  research 
workers  and  personal  leadership  in  the  world  statistical  community. 

The  Annual  Design  of  Experiments  Conferences  are  sponsored  by  the  Army 
Mathematics  Steering  Committee  on  behalf  of  the  Office  of  the  Chief  of 
Research  and  Development,  Department  of  the  Army. 

The  Wilks  Award  is  given  each  year  to  a  statistician  and  is  based 
primarily  on  his  contributions,  either  recent  or  past,  to  the  advancement 
of  scientific  or  technical  knowledge  in  Army  statistics,  ingenious  applica¬ 
tion  of  such  knowledge,  or  successful  activity  in  the  fostering  of  cooperative 
scientific  matters  which  coincidentally  benefit  the  Army,  the  DOD,  and  the 
Government,  as  did  Samuel  S.  Wilks  himself. 

The  Award  consists  of  a  medal,  with  a  profile  of  Professor  Wilks  and 
the  name  of  the  Award  on  one  side,  and  the  seal  of  the  American  Statistical 
Association  and  the  name  of  the  recipient  on  the  other  side;  an  honorarium 
related  to  the  magnitude  of  the  award  funds  donated  by  Mr.  Rust;  and  a 
citation. 

With  the  approval  of  President  Frederick  Mosteller  of  the  American 
Statistical  Association  (ASA),  the  WilkB  Award  Committee  for  1967  consisted 
of: 

Professor  Robert  E,  Bechhofer,  Cornell  University 

Dr.  Francis  G.  Dresael,  Duke  University  and  the  Army  Research  Offlce- 
Durham 

Dr.  Churchill  Elsenhart,  National  Bureau  of  Standards 

Professor  Oscar  Kempthorne,  Iowa  St^te  University 

Dr.  Alexander  M.  Mood,  U.S.  Office  of  Education 

Major  General  Leslie  E.  Simon  (Ret.),  Winter  Park,  Florida 

Dr.  Frank  E.  Grubbs,  Ballistic  Research  Laboratories,  Aberdeen  Proving 
Ground,  Maryland  -  Chairman 

Professor  Cochran  was  born  in  Rutherglen,  Scotland,  and  received  MA 
degrees  from  Glasgow  University  and  Cambridge  University.  He  was  a 
statistician  working  with  the  eminent  R.A.  Fisher  at  the  Rothamsted  Experi¬ 
mental  Station  (England),  1934-1939;  Professor  of  Mathematical  Statistics, 

Iowa  University,  1939-1946;  Associate  Director  of  the  Institute  of  Statistics, 
University  of  North  Carolina,  1946-1948;  Professor  of  Biostatistics,  School 
of  Hygiene  and  Public  Health,  The  Johns  Hopkins  University,  1948-1957;  and 
has  been  Frofeaaor  of  Statistics  at  Harvard  University  since  1957. 

In  connection  with  professional  societies  and  committee  activities, 
Professor  Cochran  has  served  as  follows: 
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rdlcv ,  -_-r.  StitiStical  AsCCtiCt4— * 

President,  1953 

Editor  of  the  Journal  of  the  American  Statistical  Association,  1945-1950 

Fellow,  Institute  of  Mathematical  Statistics: 

President,  1946 

Fellow,  American  Public  Health  Association 

Member,  Biometric  Society: 

President,  1954,1955 

Honorary  Fellow,  Royal  Statistical  Society 

Member,  International  Statistical  Institute: 

Vice-President,  1963-1967 

Fellow,  American  Association  for  the  Advancement  of  Science: 

Vice-President,  1966. 

Committee  activities: 

Chairman,  Panel  of  Advisors  on  Sampling,  U.S.  Bureau  of  the  Census 

Chairman,  Committee  on  Training  in  Epidemiology  and  Biometry,  N.I.H. 

Member,  Advisory  Committee  to  Atomic  Bomb  Casualty  Commission 

Member,  Committee  on  Statistical  Education,  Inter-American  Statistical 
Institute. 

Professor  Cochran  has  publishad  books  as  follows: 

E.J,  Russell,  J.A.  Voelcker,  and  W.G.  Cochran,  Fifty  years  of  field 
experiments  at  the  Woburn  Experimental  Station.  Longmans,  Green 
and  Co.,  London,  1936. 

W.G.  Cochran  and  Gertrude  M.  Cox,  Experimental  designs.  John  Wiley 
and  Sons,  New  York,  1950.  Second  edition,  1957.  Japanese  transla¬ 
tion,  1954.  Spanish  translation,  1965. 

W.G.  Cochran,  Sampling  techniques.  John  Wiley  and  Sons,  New  York,  1953. 
Second  edition,  1963.  Portuguese  translation,  1965. 

W.G,  Cochran,  F.  Mosteller  and  J.W.  Tukey,  Statiotical  problems  of  the 
Kinsey  Report.  American  Statistical  Association,  Washington,  D.C., 

1954. 

G.W.  Snedecor  and  W.G.  Cochran,  Statistical  methods,  6th  edition.  Iowa 
State  University  Press,  1967, 

Professor  Cochran  is  the  author  of  some  eighty-five  papers  which  form 
the  very  basis  for  much  of  the  wide-spread  use  of  statistical  techniques,  and 
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otherwise  represent  some  of  the  more  significant  and  widely  employed  methodology 
in  the  entire  field  of  theoretical  and  experimental  statistics. 

Indeed,  statisticians  throughout  the  world  regard  Professor  Cochran  as 
a  "giant"  in  the  field  due  to  his  numerous  and  wide-spread  basic  contributions. 

Professor  Cochran's  most  recent  honor  is  the  presidency  of  the  Inter¬ 
national  Institute.  His  work  in  design  of  experiments  recently  has  dealt 
with  the  efficient  sequential  determination  of  levels,  and  more  recently  he 
has  also  been  working  on  the  design  and  analysis  of  observational  studies  * 

His  books  have  been  translated  into  several  languages. 

The  citation  to  Professor  Cochran  reads  as  follows: 

"To  Professor  William  G.  Cochran  -  for  continued  research 
on  the  statistical  treatment  of  data,  for  his  highly 
fertile  research  on  the  design  and  analysis  of  experiments 
and  surveys,  for  his  excellent  books  on  the  theory  and 
practice  of  statistical  methodology,  for  his  efforts  in 
the  training  of  statisticians  at  all  levels,  and  for  hla 
contributions  to  national  and  international  statistical 
societies." 


Professor  Cochran  received  the  third  Wilks  Memorial  Medal  at  the 
banquet  held  in  connection  with  the  Thirteenth  Conference  on  the  Design 
of  Experiments.  Dr.  Frank  E.  Grubbs  made  the  presentation.  The  acceptance 
remarks  of  Professor  Cochran  are  printed  below. 


Chairman  Grubbs,  Ladies  and  Gentlemen: 

I  greatly  appreciate  this  high  honor.  It  is  especially  pleasing  because 
Sam  Wilks  was  the  first  American  Statistician  whom  1  ever  met.  This  was  in 
1933,  when  I  was  a  graduate  student  at  Cambridge  University.  Sam  came  there 
as  a  postdoctoral  International  Fellow,  so  that  I  enjoyed  over  30  years  of 
his  friendship,  including  working  under  Sam  in  1944  in  the  Princeton  Statistical 
Research  Group  of  the  Office  of  Scientific  Research  and  Development. 

An  occasion  like  this  naturally  stimulates  reflection  about  one's  past 
work.  I  might  mention  one  habit,  common  among  statisticians,  that  has  helped 
me.  In  consulting,  there  are  always  times  when  I  cannot  answer  thn  question, 
and  times  when  I  give  &n  answer,  but  realize  after  the  investigator  has  left 
that  I  distorted  hie  question  in  order  to  make  it  fit  into  some  standard 
statistical  mold.  I  like  to  make  a  note  of  the  difficulty  and,  so  far  as  my 
ability  permits,  to  see  if  anything  constructive  can  be  done  about  it. 

This  habit  also  protects  effectively  against  any  tendency  to  develop  a 
swelled  head.  From  time  to  time  I  see  a  new  paper  that  presents  the  first 
competent  handling  of  a  problem  of  which  l1  have  been  ineffectively  aware  for 
many  years.  Now  my  subconscious  self  is  one  of  the  most  unpleasant  characters 
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I  have  ever  had  to  deal  with.  On  such  occasions  It  always  surfaces  and  says, 
"See,  If  you  had  any  brains,  or  had  paid  a  little  attention  to  the  advice  that 
1  keep  giving  you  subconsciously,  you  might  have  cleaned  up  this  problem  20 
years  ago*'1 

In  thinking  about  the  present  state  of  work  in  statistics  from  the 
viewpoint  of  allocation  of  resources,  we  seem  now  to  be  well  provided  with 
research  manpower  in  mathematical  statistics.  In  fact,  I  have  sometimes 
tried  to  argue  that  there  is  too  much  research  in  mathematical  statistics, 
though  when  I  do  this,  everybody  jumps  on  me.  In  academic  circles,  the  idea 
that  one  can  have  too  much  research  on  any  subject  is  heresy  of  the  worst 
kind. 


As  an  illustration,  consider  a  problem  that  has  arisen  in  the  last  15 
years.  In  the  sampling  of  institutions  like  businesses,  schools,  hospitals, 
and  counties,  that  vary  in  size,  there  is  need  for  a  method  of  selecting  a 
sample  without  replacement  and  with  probabilities  proportional  to  measures 
of  the  sizes  of  the  units.  There  are  two  main  difficulties.  With  the 
simplest  methods  of  selection,  it  ia  impossible  to  compute  from  the  sample 
an  unbiased  estimate  of  the  variance;  with  other  methods,  the  estimate  of 
variance  la  so  unstable  that  negative  estimates  of  variance  can  turn  up. 
Secondly,  as  the  sample  size  increases,  it  becomes  harder  to  keep  the 
probabilities  proportional  to  size. 

The  problem  is  Important  enough  so  that  under  a  system  of  planned 
resource  allocation  one  could  juatlfy  assigning  three  or  four  good  men  in 
different  places  and  preferably  in  different  countries,  to  work  on  it 
independently.  Nov  Mr.  Kenneth  Brewer,  Director  of  Methodology,  Common¬ 
wealth  Bureau  of  Census  and  Statistics,  Canberra,  Australia,  is  currently 
spending  some  time  with  us  at  Harvard.  One  of  his  tasks  is  to  prepare  what 
will  be  a  highly  useful  comparative  and  critical  review  of  the  methods  that 
have  been  produced  for  sampling  with  probabilities  proportional  to  size 
without  replacement.  To  date  he  has  found  in  the  literature  not  3  or  4 
BMthods,  but  34.  Indeed,  when  the  latest  issue  of  any  journal  reaches  my 
desk  these  days,  I  hesitate  to  open  it,  in  case  it  contains  yet  another 
method  which  Mr.  Brewer  will  have  to  compare  with  the  current  crop  of  34. 

It  almost  sounds  like  too  much  of  a  good  thing. 

In  two  other  aspects  of  the  health  of  our  profession,  however,  the 
situation  seems  less  favorable.  One  aspect  concerns  mechanisms  for  ensuring 
that  new  and  useful  statistical  techniques  are  regularly  explained  to  the 
potential  users  in  language  that  they  can  understand;  the  other,'  mechanisms 
by  which  statisticians  are  regularly  kept  informed  of  unsolved  statistical 
problems  encountered  by  users.  The  difficulties  in  this  kind  of  communication 
are  well  known.  Users  have  little  time  to  devote  to  learning  statlstlcel 
techniques  and  often  very  limited  knowledge  of  mathematics  and  probability; 
work  by  statisticians  on  exposition  carries  little  prestige;  and  the  efforts 
of  the  statistical  profession  in  this  area  have  been  most  sporadic  and 
voluntary.  In  this  connection,  I  think  that  Sam  Wilks,  after  his  early  years 
of  brilliant  and  productive  research,  deliberately  chose  to  sacrifice  his 
future  research  interests  in  order  to  concentrate  on  organizational  problems 
in  the  new  field  of  statistics,  including  problems  of  communication  with 


•*••  «*■<>»  i  r.,.td 
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questions  raised  by  the  users  to  »ku  Ch  etatiaticiane  can  learn  about 
answer.  Long  nay  they  continuJ.  *  *  CUrrent  r“**rch  does  not  supply  an 
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DETERMINATION  OF  TBO  BY  WE1BULL  DISTRIBUTION 
USING  REPAIRABLE  COMPONENTS 


John  L.  Mundy 

U.S.  Army  Aviation  Materiel  Command 
St.  Louis ,  Missouri 


ABSTRACT. 

1.  The  Army  Aviation  Command  has  found  that  a  serious  discrepancy  exists 
between  the  figures  set  by  contractors  for  the  life  time  of  critical  components 
and  for  the  Time  Between  Overhaul  (TBO)  for  noncritlcal  components  and  the 
figures  actually  achieved  in  practice.  For  many  components  only  82  ever 
reach  their  rated  life  time,  and  only  5%  reach  their  TBO  time. 

2.  In  addition.  It  la  necessary  to  determine  the  time  required  to  break- 
in  systems,  if  any.  This  break-in  time  is  sometimes  referred  to  as  burnlng-ln. 

3.  To  determine  the  statistical  TBO,  life  times,  and  break-in  periods, 

as  set  by  actual  field  usage,  the  Weibull  probability  distribution  was  applied. 
The  work  of  Mr.  J.H.K.  Kao  was  extended  from  non-replaceabls  items  to  repairable 
items.  Three-phase  life  was  used  consisting  of  Infant  Mortality,  Catastrophic 
or  Random  Failure  period,  and  the  Vearout  period.  The  graphical  trial  and 
error  method  of  Mr.  Kao  was  replaced  by  a  Fortran  computer  program.  In  addition, 
the  iterative  method  was  streamlined  Into  a  deterministic  method.  This 
represents  a  major  contribution  which  reduced  the  computer  time  by  852. 

4.  Flow  charts  of  the  operation  have  bean  prepared.  The  source  of  data 
is  the  DA  Form  2410  and  DA  Form  2408-3. 


ACKNOWLEDGEMENTS .  Many  AVCOM  engineers  contributed  technical  assistance 
to  this  report.  Gratitude  is  due  to  Mr.  Wm.  Brabson,  R&D  Division,  Mr.  J.K. 
Gerdel,  Special  Studies  Office,  Mr.  D.  Burchfield,  Quality  Assurance  Office 
and  Mr.  D.  Fleming,  Quality  Assurance  Office  for  Supervisory  support. 

The  lengthy  Computer  Flow  Chart  was  prepared  by  Mr.  M.  Ploudre,  RAD 
Division. 

Valuable  programming  assistance  is  being  supplied  by  Mr.  F.  Blackshear, 
Special  Studies  Office. 

The  Intricacies  of  the  TAERS  operations  were  thorougly  explained  by 
Mr.  R.  Jesse  and  Mr,  F.  Gruaninger,  Directorate  of  Maintenance  and  Mr.  M. 
Chrlatlanar,  General  Engineering. 

The  essential  'hammer  and  nails'  work  of  hand  checking  of  calculations 
was  done  by  Mr.  M.  Ploudre,  R&D  Division,  with  the  help  of  Mr.  D.  Carter, 

R&D  Division. 
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CONTRIBUTIONS . 


1.  Use  of  TAERS  to: 

a.  Break  down  Failure  History  to  No.  of  Overhauls. 

b.  Break  down  Failure  History  to  No.  of  Repairs  after  overhaul. 

c.  Break  down  Failure  History  to  Age  of  item  after  "N"  repairs 

after  "M"  overhauls. 

d.  Uae  of  TAERS  to  report  Unfailed  as  well  as  Failed  Items. 

2.  Method  for  Identification  of  Pblty  Distribution  Composite,  under 
assumptions  that  composite  consists  of  not  more,  than  3  other  Pblty 
Distributions. 

3.  Determination  of  Burn-In,  TBO  or  Finite  Life  for  components,  as 
determined  by  field  usage,  Instead  of  engineers  prior  to  fielding. 

4.  Two  methods  for  Non-Graphlcal  Determination  of  Welbull  Shift  Para¬ 
meters  (Gamma). 

3.  Complete  Elimination  of  need  of  Kao  Plotting  Paper. 

THE  AUXILIARY  WORK.  TAPE  LAYOUT.  The  Army  Aviation  Command  of  St.  Louie, 
Mo.  malieains  a  Validate  Tape  File  of  DA  2410  forms  received  from  the  field. 
This  form  is  completed  by  Repair-Personnel ,  and  contains  data  concerning  the 
removal  and  repair  of  components.  This  fora  is  one  of  the  class  of  forms, 
known  as  TAERS.  Presently  about  12  million  2410  records  which  have  been 
validated,  are  on  file. 

From  this  tape,  certain  items  were  extracted.  These  items  were  combined 
with  other  items  from  other  tapes  to  create  an  auxiliary  work  tape,  which 
contains  all  the  items  needed  in  this  program.  This  work  tape  layout  is 
shown  in  Figure  1.  This  figure  1  shows  one  record  on  the  tape.  One  record 
will  exist  for  each  2410  report. 

This  program  will  determine  the  "Burn-In"  time,  and  the  "Time  Between 
Overhaul",  (TBO)  from  field  data.  The  field  value  of  TBO  will  be  compared 
with  the  Established  TBO  in  columns  39-42;  and  it  will  also  be  compared  with 
values  of  competitor’s  interchangeable  parts.  The  interchangeable  part 
numbers  are  obtained  from  another  program,  and  printed  in  columns  168-297. 

Since  this  program  analyses  the  failure  times  of  each  component,  the 
four  dates  in  columns  73-88  are  very  important. 

The  first  date  in  73-76  is  the  Date  of  First  Installation  of  a  new 
serial  number,  (FID).  This  is  obtained  from  Copy  6  of  a  2410  record  under 
the  condition  that  Copy  1  is  missing,  for  the  same  Document  Control  No.  in 
Copy  6.  Copy  6  is  used  for  installation  reporting,  and  Copy  1  is  used  for 
removal  reporting.  Therefore,  when  an  item  is  uew,  and  originally  installed, 
a  Copy  6  will  be  mads  out  before  Copy  1,  under  one  document  number.  Leter, 
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when  the  component  Is  removed,  a  Copy  6  will  be  found  under  a  different 
control  number. 

The  second  date,  in  77-80,  is  the  Date  of  Re-Installation  (RID). 

This  Julian  date  (RID)  la  taken  from  Copy  6,  when  a  Copy  1,  Is  present, 
under  the  same  document  number. 

The  third  Julian  Date  of  Importance  la  (ODNR)  which  la  Off-Date,  Not 
Reinstalled.  It  is  taken  from  Copy  1,  if  Copy  6  is  absent. 

The  fourth  and  laat  Julian  date  needed  is  (ODYR)  which  is  Off-Date 
Yea  -  Reinstalled.  It  is  taken  from  Copy  1,  if  Copy  6  is  present. 

After  this  auxiliary  tape  is  created,  two  other  data  tapes  are  made 
from  it,  by  varloua  sortings  and  re-arrangements  to  facilitate  programming. 

One  tape  will  contain  records  within  25  days  to  390  days  of  the  most 
recent  record.  Data  within  the  first  25  days  Is  deleted,  to  allow  for  delays 
In  the  mall. 

Each  tape  la  than  sorted  au  shown  in  Fig. 2 . 

2.  SORTING  OPERATIONS  ON  THE  WORK  TAPE.  Fig.  2  shows  that  the  major 
grouping  la  %  Part  Number,  followed  by  i  minor  groupings  -  Overhaul  Group, 

Age  Group,  and  Equal  Number  of  Repair  Timas.  This  Is  followed  by  sorting 
according  to  the  time  of  failure. 

This  program  Is  the  first  that  categorises  the  parts  according  to  the 
number  of  Overhauls,  This  will  determine  the  failure  history  of  new  parts, 
such  as  engines,  compared  to  engines  which  have  been  overhauled  X-number  of 
times. 

Another  first,  within  AVCOM,  is  the  grouping  of  partB  of  the  same  age, 
within  the  same  set.  This  method  will  reveal  whether  failure  rates  are 
constant,  for  components  of  different  ages.  This  assumption  of  constant 
failure  rate  Is  made  many  times,  for  no  other  reason  but  that  it  is  simple 
to  use.  This  analysis  will  chsck  the  validity  of  this  assumption. 

The  lower  right  bar  shows  that  unfailed  Items  are  also  considered. 

This  Is  the  first  time  that  TAERS  haa  been  used  to  report  unfailed  as  well 
as  failed  Items,  This  method  is  presented  next. 

3.  HANDLING  OF  UNFAILED  ITEMS.  The  first  step  is  to  identify  those  items 
still  in  thd  aircraft. 

This  is  done  by  analysis  of  the  4  critical  Julian  Dates  previously 
discussed.  A  sample  set  of  these  4  dates  for  one  serial  numbered  component 
la  shown  below  lu  Figure  3. 


Sorting  Operations 


Doc.  Control 
Number 

FID 

(Copy  6,  date, 
when  no  1) 

ODYR 

Copy  1,  date, 
when  have  6 

RID 

Copy  6  date 
when  have  1 

ODNR 

(Copy  1  date 
when  no  6) 

1 

4(365) 

0 

0 

0 

2 

0 

5(002) 

5(003) 

0 

3 

0 

6(205) 

6(206) 

0 

4 

6(115) 

Look  at 

the  FID  or  RID  - 

- - — . - T  ,  . — - - - — - - 

item  installation  date.  Then  look  at  the  0DRN., 

to  see  if  the  item  has  been  removed  at  a  later  date. 

In  this  set,  the  highest  installation  date  iB  RIO  ■  6(206) . 

This  item  has  not  been  removed  at  any  date  later  than  6(206),  as  shown 
in  the  ODNR  column,  because  the  highest  ODNR  is  6(115). 

Therefore,  the  test  consists  of  2  steps; 

Step  1)  Find  the  highest  Installation  date,  whether  it  is  FID,  or  RID. 

In  our  sample,  it  is  RID  "  6(206). 

Step  2)  Test  whether  this  RID  is  greater  than  the  highest  value  of 
"Off  Date  -  Not  Reinstalled,"  (ODNR).  .Here  it  is  6(115),  so  the  answer  Is 
"Yes."  This  item  is  then  treated  as  an  "Unfailed  Item." 

Note  that  this  ODNR  column  would  be  all  zeroes,  for  an  Unfailed  Item, 
if  all  were  complete.  However,  if  a  Copy  6  were  deleted  by  the  Validation 
Tests,  a  number  such  as  6(115)  would  appear  here. 

After  identification  of  an  unf ailed  item,  the  number  of  hours  logged  in 
this  unfailed  item  (UFH)  by  the  end  of  the  caleudar  period  must  be  found. 

This  UFH  is  given  by  the  following  formula: 

UFH  -  FFH-OFH 

where  OFH  *  Original  Flying  Hours,  and  FFH  -  Final  Flying  Hours.  OFH  is  found 
by  searching  the  2408-3  data  tap*:.  The  file  is  entered  with  Tail  Number  of 
the  Aircraft  that  the  component  was  installed  on,  and  searching  for  the  record 
whose  date  is  nearest  to  the  date  of  component  installation  (RID).  From  this 
record,  is  read  the  Number  of  Flying  Hours  on  the  aircraft  at  installation  time. 

FFH  is  found  by  again  searching  the  2408-3  data  tape  in  a  similar  manner. 
But  this  tium  a  search  is  made  for  the  record  whose  date  is  nearest  to  the 
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data  at  the  end  of  the  test  period ,  (DEP) .  One  test  period  will  be  for  the 
past  year,  and  the  other  will  go  back  to  the  oldest  record  in  the  file. 

For  each  unfalled  item  a  new  record  Is  generated,  resembling  the  record 
that  indicated  an  unfailed  item.  However,  the  number  of  unfailed  hours  is 
entered  on  this  record. 

It  must  be  noted,  however,  that  in  the  analysis  part  of  the  program, 
there  is  a  condition  on  the  acceptance  of  an  unfailed  item  into  the  group  of 
failed  items.  The  item  is  included,  if  and  only  if,  the  number  of  unfailed 
hours  is  equal  to,  or  greater  than  the  largest  number  of  hours  logged  in  a 
component  that  failed. 

So,  if  one  Serial  Number  item  operated  for  200  hours,  and  then  failed, 
another  serial  number,  with  the  same  part  number,  cannot  be  called  an  unfailed 
item  until  it  has  run  at  least  200  hours. 

This  completes  the  discussion  of  the  data  tape. 

The  next  item  to  be  discussed  will  be  the  Format  which  is  the  output 
of  the  program. 

4.  OUTPUT  FORMAT  OF  RESULTS.  The  output,  Fig.  4  shows  the  desired  values 
that  the  program  determines. 

The  header  lists  the  Interchangeable  parts  and  information  on  the  Prime 
Part  Number  (i.e.,  Che  part  number  being  analysed.) 

The  objectives  of  the  program  are; 

(1)  The  data  TBO  or  Finite  Life,  and  the  Burn-In  Time,  which  will  be 
compared  with  the  Established  Value  by  the  Contractor. 

(2)  The  next  objective  is  the  Composite  Probability  Density  Distribution, 
for  all  3  Life  Phases:  1)  Bum-In  Time,  2)  Random  Failure  Phase  and  3)  Wearout 
Phase . 

It  is  necessary  to  find  if  the  failure  rate  follows  an  exponential 
distribution  (B-l)  or  a  normal  distribution  (B-2.6)  or  some  other  of  the 
Welbull  family;  Fig.  S.  This  distribution  is  a  function  of  3  parameters, 

Gamma  (g) ,  which  Is  the  shift  parameter,  Beta  (B),  which  is  the  shape  para¬ 
meter,  and  Eta  (N),  which  is  the  characteristic  Life  parameter. 

(3)  Life  characteristics  are  then  found  from  the  Weibull  Distributions. 
These  are:  Reliability,  Hazard  Rate,  Variance,  Expected  Value,  etc.  An 
example  of  Hazard  Rate  as  a  function  of  B  is  shown  in  Fig.  6.  Fig.  7  shows 
an  Average  Hazard  Rate  for  an  actual  item. 

5.  MATHEMATICAL  DEVELOPMENT,  The  determination  of  the  first  objective 
will  be  presented.  We" want  to  determine  the  TBO,  or  Finite  Life,  and  the 
Burn-In  time.  This  will  be  presented  graphically,  first.  See  Fig.  8. 
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PLOT  OP  THE  WE  (BULL  PROBABILITY  DEN8ITY  FUNCTION  FOR  VARIOUS  VALUES  OF  jSf  (H»l  V  (g-o) 


,5 


THE  WEIBULL  PLOT  OF  FAILURE- RATE  VERSUS  TEST  DURATION 


An  ordered  plot  Is  made  of  all  times  of  failure  on  the  so-called  Kao 
Waibull  Paper. 


This  graph  paper  has  the  scales  chosen  such  that  a  plot  of  a  straight 
line  on  this  paper,  yields  the  shaping  parameter  "B"  of  the  Weibull  p.d.f. 
However,  the  original  plot  of  the  raw  data  vill  not  yield  straight  line 
segments.  The  linearisation  process  will  bs  dlacuessd  later. 

Now,  assuming  a  plot,  as  shown  on  Fig.  8,  tha  TBO,  or  Finite  Life  la 
defined  aa  tha  lntarsactlon  of  tha  two  tangent  linaa.  The  value  la  read 
from  tha  horlsontal  scale,  indicated  by  the  symbol  Belts  (S).  This  point 
is  indicated  by  the  maximum  positive,  change  of  slops  between  any  two  data 
points.* 

N*xt,  the  Burn-In  time  will  bs  determined.  (Turn  tha  figure  upside  down). 
Suppose  that  the  failure  points  fell  slong  this  plot.  This  plot  shows  that 
this  component  has  a  Burn-In  time  at  Delta.  Beyond  the  Burn-In  time,  the 
failure  rate  gets  much  better,  and  becomes  significantly  lass.  This  point 
is  indicated  by  the  Maximum  NEGATIVE  Bata  of  Change  of  Slope,  between  any 
two  points. 

Now,  that  our  1st  objective  has  bean  reached,  tha  2nd  obj active  will  be 
determined.  This  lathe  identification  of  tha  type  of  tha  Waibull  3-Phaoe 
Composite  Cumulative  Probability  Density  Distribution. 

3  f  I  t-g .  I  B.  ‘ 

rct)ml\L  “p  [-hr  I  J* 

Tha  Waibull  distribution  la  ldantlfisd  by  tha  3  paramstars,  Bats,  Eta, 
and  Gamma.  Tha  coafflclanta  A.  raprasant  tha  proportion  of  failures  in 

sach  phaas. 


For  a  "Kao  Plot"  consisting  of  only  ons  straight  line  segment!  (Fig.!) 


■  HP-****  <Ht*  •1). 


-(t-Gauaa) 

5ti> 


A  aat  of  3  parameters  la  naadad  for  each  straight  segment  of  tha  plot. 
Thera  may  be  a  maximum  of  3  straight  line  segments. 


*Delta  (S)  may  ba  found  mathematically.  It  ia  tha  time  at  which  tha  proper tl 
of  random  failures,  indicated  by  tha  lower  straight  section,  is  equal  to  th>. 
proportion  wsarout  failures,  indicated  by  tha  upper  straight  section.  It  it. 
equated  below! 

1  -  exp  t  -  (S  |  Nj)82}  -  1  -  exp  [  -  (S  |  Nj)®3] 

S  -  exp  l(B2  in  N2  -B3  in  N3>/(B2  -  Bj)]. 
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3d  mi  Vi  iN33ti3d 


*  FAILURE  AGE  y  =1500  hrs. 

WElBULL  PROBABILITY,  ICAO  PLANE 


Beta  (the  Shape  Parameter)  is  found  from  the  slope  of  each  straight  line 
segment ,  (or  Sub-population) . 

Eta,  the  characteristic  Life  Parameter,  denotes  the  failure  age  of  63. 2g 
of  the  items.  From  the  graph,  Eta  can  be  read  directly  by  entering  the 
Vertical  Scale  at  63.2,  and  progressing  to  the  intersection  of  the  straight 
line  segment.  Then  read  Eta  on  the  horizontal  scale,  directly  under  this 
intersection.  For  computer  purposes,  the  equations  of  the  two  lines  involved 
trust  be  formulated.  The  equation  for  ETA  turns  out  to  be: 

“•  ‘  “p  1  Kw  +  551 

where  XX  refers  to  the  mean  data  point  on  the  hovisontal  axis,  and  YY  refers 
to  the  mean  data  point  on  the  vertical  axis. 

The  exponential  function  is  Involved  because  the  XX  and  YY  values  are 
actually  logarithmic  values  to  the  base  'V.  Therefore,  the  anti-log  must 
be  taken. 

Osama,  the  Shift-Parameter,  is  found  by  determination  of  the  data-shift 
used  to  convert  non-linear  data  into  linear  date.  This  determination  will 
be  discussed  later.  Fig.  9  shows  that  the  final  value  of  Gamma  should  check 
with  the  value  of  intercept  on  the  horliontal  scale.  The  physical  meaning 
of  Gamma  is  the  time,  before  which,  no  failures  had  a  chance  of  occurring. 

The  method  for  determination  of  Gamma  follows: 

F(x)  -  1  -  exp  [  -  C2^)*]  X,B,N  >  0 

where  F(x)  is  the  Cumulative  Probability  Distribution. 

1  -  F(x)  -  exp  [  -  (rjj*)8] 

t=^t  -  «■*  <¥>B 

&n[l/(l-F(x))]  -  [(x-g)/N]B 


in  in  [1/ (l-F(x)) ]  -  B  in  [(x-g)/N) 

■  B  in  (x-g)  -  B  in  M. 

This  aquation  shows  that  a  plot  of  the  left  hand  versus  in(x-g)  will 
yield  a  straight  line,  with  a  slope  equal  to  B. 

[F(x)]  (100)  ■  Percent  Failure,  or,  in  other  words:  ~  this  is  the  percent 
of  failures  that  we  can  expect  in  the  time  "X". 

The  y-lntercept  is  [-B  in(M)]  is  a  measure  of  the  Goodness-of-Plt  of  the 
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Weibull  plot.  It  can  hp  rntnnarpH  w*f  t-h  a  opC!!,,?trIcsl 
at  the  x’l  vertical  line,  which  is  the  in"x  -  0  line, 
sample  data  for  Gamma  ■  20. 


smii ■(  w  of  >1  /« 

See  Fig  •  *9 


r  _n  fl~  /\t\  1 

*  L  «  J  I 

that  shows  that 


The  original  data  yielded  Curve  A.  Since  this  is  not  linear,  the  slope, 
Beta,  cannot  be  determined  yet. 

The  first  guess  of  Gamma  may  be  taken  graphically  by  extending  the  curve 
down  to  the  horizontal  axis.  This  gives  Gamma  »  1500  hrs.  Since  a  computer 
program  does  not  have  access  to  the  graph,  the  first  guess  of  Gamma  is  taken 
to  be  (2/3) (Time  of  First  Failure),  i.e.  (2/3)(Time  of  1st  Data). 

To  apply  this  first  guess  of  Gamma,  the  value  of  1500  hrs.  is  subtracted 
from  each  time  of  failure. 


Curve  B  is  an  example  of  a  plot  of  data,  which  was  adjusted  for  a  value 
of  Gamma  ■  27.5,  Hecto-hours.  The  fact  that  these  two  curves,  (A  and  B) 
have  opposite  curvatures  indicate  that  the  true  value  of  Gamma  lies  somewhere 
between  15.  and  27.5.  Further  trials  showed  that  Gamma  -  20  is  the  value 
that  linearizes  the  curve.  This  is  shown  on  Curve  C. 

SUCCESSIVE  DETERMINATION  OF  GAMMA. 

The  equation  that  gives  ua  the  next  value  of  Gamma  to  try  is  developed  next. 

Refer  to  curve  A  of  Figure  9.  It  is  known,  in  this  case,  that  if  the 
correct  value  of  g  -  20  would  be  subtracted  from  each  X-value  of  each  data 
point,  the  result  is  the  linear  Curve  C.  The  problem  is  to  set  up  an  equation 
which  will  first  be  developed  for  the  simple  case  of  only  3  data  points. 

DETERMINATION  OF  GAMMA  FOR  3  IDEAL  POINTS. 

"  ' . -  •  1  \ 

As  seen  on  Figure  10,  the  point  (Y„)  is  Ideally  located  equidistant 
between  Y1  and  Y3<  L 

In  addition,  there  are  only  3  points,  which  is  the  simplest  caae  possible. 
The  analysis  is  shown  in  Figure  10. 

The  difficulty  with  this  equation  is  its  sensitivity  to  Xt  or  Xy 

For  example:  If  X1  -  33.3,  g  -  8,  and  if  X1  -  35.3,  g  -  28. 

Also:  1)  Always  subtract  off  (2/3) (1st  data  pt)  to  get  first  guess  for 
g,  to  reduce  sensitivity.  2)  Reiterate.  Also  use  "g"  equation  in  the  form 
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»  i  —port 


001  )  «C  trc  -  X 


JETERM I  NAT  I  ON  OF  GAMMA  (g)  FOR  3  IDEAL  POINTS 
Phase  2 

Set  the  change  of  Slope  (AS)  =  0,  and  solve 
for  Gamma. 

AS*  Slope2-Slope  1*  0 


But  Y  uppsr-Y  lower: 

,  /x?~gN  ,  ,Xq-9, 

x>j-g  Xg-g 

» 

Take  Anti-Ln  of  each  side. 

(*2-g)2  =  (x1-g)(x3-g) 
x22-2.x2g+g2  =  x1 x3«x1 g-x3g+g2 
x22-x1x3  =  g(2x2?*x1-x3) 

vhere  Xg  must  be  ehonen  opposite  to  on  the  Date  Plena 


Fie  ii 
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.  OETER.VII  NATION  OF  GAMVIA  (g)  FOR  ACTUAL  CASE  OF 
10.  POINTS.  -  Phase  1. 

i  1.  DIVIDE  DATA  INTO  UPPER  AND  LOWER  DATA, 

WITH  THE  DIVIDING  LINE  TAKEN  AS  THE  AVERAGE  (xQ) 

x  -  27 -5+31. +34. +38- +41. +44. +47. +51 .+57. +64. 

O'  10. 

Xq ■  43.4 

2.  SET  Xg-  AVERAGE  OF  UPPER. 

Xj=  (44. +47. +51. +57. +64. )/5.  =  52.6 

3.  SET  x. =  AVERAGE  OF  LOWER. 

x1  =  (27. 5+31. +34. +38. +41 .)/5.  =  34.3 

4.  NEXT,  FIND  x2(YD): 

Where  Yfl=  AVERAGE  OF  Y1  AND  Y  .  ■, 

Y,  =  Y,  AND  Y..=  Y,  FOR  CONVENIENCE 
L  1  U  3 

'  Fig  12 
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DETERMINATION  OF  GAMMA  (g)  FOR  ACTUAL  CASE  OF 
10.  POINTS  -  Phase  2. 

FIND  Yl  FOR  xL=  34.3,  (XL=ln34.3=3. 54) 

1.  INTERPOLATE  BY  DRAWING  CURVE  IN  THE  FUNC¬ 
TIONAL,  OR  KAO  PLANE,  THROUGH  THE  3  POINTS 
CLOSEST  TO  XL. 

THESE  ARE  31,  x^=  34,  and  38. 


X  _ 

X 

. y 

Y 

31 

3.43 

18 

1.63 

34 

3.52 

27 

1.3 

38 

3.64 

36 

0 . 82 

V  a*b(V+c<xL>2 

WHERE  a,  b  and  c  follow. 
Yl=  1.26 

Fig  13 
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DETERMINATION  OF  GAMMA  (g)  FOR  ACTUAL  CASE  OF 
10.  POINTS  -  Phase  3. 

k  ^ 

Mo^(VXo)  +  YoX1Xo(xrX,)  +  YJ  XfX  -X) 
a=  -1 . 2.  ..3 _ 2 _ 2 _ 2-X.JL...J _ 3- _ 3  12  2  1 

[V3(  VV'WVV'W  YN >]  5  ° 

2  2  2  2  2  2 
Y„(X_  -X_  )+Y_ (X_  -X„  )+Y, (X_  -X„  ) 


Y-(X  -X, )+Y_ (X  -X_)+Y_(X, -X_ ) 


3'  2 


3  2  2 


These  are  the  equations  for  a  parabolic  fit 
through  3  points. 


Y=  a  «•  bX  +  cX' 


DETERMINATION  OF  GAMMA  (g)  FOR  ACTUAL  CASE  OF 
10.  POINTS  -  Phase  4 


SIMILAR  TO  THE  METHODFOR  FINDING  Y  ,  Y 
IS  FOUND  FOR  x  =  52.6  (X  =  ln52.6=  3.9&)L  U 


THE  PARABOLA  IS  DRAWN  THROUGH  THE  3  POINTS 
CLOSEST  TO  52.6  'WHICH  ARE  SHOWN  BELOW. 


X 

X 

y 

Y  1 

_ 1 

47 

3.85 

64. 

CM 

O 

• 

1 

51 

3.93 

73. 

-.4 

57 

4.04 

• 

CM 

CO 

CM 

tn 

• 

1 

Yyr  -.44  at  Xy  =  3.98 


(Y.  +Y  ) 

Y  -  L  U 
0“  2 


=  =  «■  .41 


Thai*  art  Kao  valuta. 


Tbay  art  tha  v&luaa  obtalnaA  aftar  going  thru 
In  In  (jo) 

tlto-  %  Falluraa) 


Fi  e  15 
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DETERMINATION  OF  GAMMA  (g)  FOR  ACTUAL  CASE  OF 
10  POINTS  -  Phase  5 


KNOWING  Yq=  ♦.4-1;  XQ  IS  FOUND  BY  DRAWING 
A  PARABOLA  THROUGH  THE  3.  Y  POINTS  CLOSEST  TO 
Yq=  +.41.  NOTE  THAT  THE  ROLES  OF  X  ANO  Y  ARE 
REVERSED. 


V 

a  + 

bYQ 

*  cYQ2 

THE 

CLOSEST 

POINTS  ARE 

V 

36. 

■ 

.  Yr 

♦ .  8 

y2s 

46. 

.  Yr 

+  .45 

* 

V 

• 

in 

in 

,  Yr 

+  .2 

Xc=  3.72  at  Yqs  +.41,  and  xd=  41.59 


2,  FINALLY 
2 


g= 


XD~"X1 X3 


j34.3W52.6)  . 


2) C41 . 59) -34.3-52 ,6 


20.0s  g 


This  is  the  true  value  of  Gamma. 

'  ng  x6 


use 

r  x*  1 

2X , 

X.1 

u  „ 

ft  *  —  1  — 

f 

U 

-1-^.1 

ge  X1  3 

X! 

*— 1 

X 

1 

Then  average  g  •-  (g  +  g^> /2 . 

If  double  precision  la  available ,  the  averaging  may  be  eliminated. 

It  has  been  found  that  we  get  better  resulta,  using  lowest  and  average 
of  top  half  of  the  data,  instead  of  average  of  the  bottom  half  and  (average 
of  top  half).  This  Insures  that  Gamma,  (g)  will  be  less  than  the  lowest 
data  value  of  "X".  A  value  of  Gamma  greater  than  the  lowest  data  value  of 
"X",  is  erroneous  and  unacceptable  due  to  the  basic  definition  of  Gamma. 

Gamma  is  the  value,  below  which  no  other  failure,  or  X  value  can  occur. 

Another  method  will  be  discussed  next.  This  second  method  is  the 
Average  Rate  of  Change  of  Slope  Method,  (ARCS). 

GAMMA  ADJUSTMENT  BY  AVERAGE  RATE  OF  CHANGE  OF  SLOPES  (ARCS) 

1.  In  each  life  phase,  find  the  Slope  between  each  2  points,  in  the 
Kao  Plane  using  in  values,  shown  in  Fig.  17. 

si  ■  -  Yi>/<xi«  -  V 

2.  Find  the  change  of  slope,  between  each  two  slopes,  (RCS)  -  (S,  ...  -  S.)/ 

(*nld1+l  -  Xmid1) .  l-rA  1 

3.  Find  Average  Rate  of  Change  of  Slope  (ARCS)  l  RCS 

ARCS  ■  - - 


Since  this  is  the  ARCS  for  the  original  data,  for  which  there  wae  no 
adjustment  for  Gemma,  call  this  value  ARCS  (Gamma  ■  0) ,  or  ARCS(O). 

4.  Again  use  the  method  above  to  find  ARCS  (Gamma  ■  first  data  point),  or 
ARCS(l) .  To  do  this,  subtract  this  first  (X-dsta)  point  from  each  successive 
data  point,  to  establish  a  new  net  of  data.  Then  put  these  data  into  the  Kao 
Plane  by  taking  the  "in"  function  shown  in  the  sketch.  Then  find  ARCS(l). 

5.  Next  find  ARCS  (2/3).  Subtract  (2/3)  of  the  first  (X-data)  point 
from  each  data,  to  form  a  new  set  of  data.  Again  put  the  data  into  the  Kao 
Plane.  Then  find  ARCS  (2/3). 

6.  Write  a  3rd  order  equation  through  the  plot  of  Gamma  versus  ARCS (A), 
Gamma  <"  aA?  -i-  bA  +  C.  Then  solve  for  the  value  of  Gamma  (Gamma  *  C) ,  that 
makes  ARCS  ■  0.  The  value  of  C(A^,  A2>  A^)  is: 

g^Aj-Ai)  41  g^(Aj-A2)  4*  g2  (A^“A^) 

C  "  A2A3(A3-A2)  +  AjAj  (A1-A3)  +  A1A2(A2-A1)  * 

7.  As  the  new  Gamma  is  found,  this  new  value  of  Gamma  is  subtracted  from 
each  of  the  X-valuea  of  the  data  points,  to  get  the  adjusted  data  points.  Then 
a  new  value  is  found  for  the  Average  Rate  of  Change  of  Slope  (ARCS) ,  to  check 
its  approach  to  zero.  If  this  ARCS  1b  within  *0.1  or  if  thl»  new  ARCS  does 


i 


In  (*-g) 


not  Improve  the  previous  ARCS  by  at  least  20%,  the  iteration  is  complete. 
The  formula  cdns.  for  stopping  iteration  are: 

1)  I  ARCS  I  <  .1 

or  1  1 

2)  | ARCS±+1/ARCSi |  >  .3 
or 

3)  (ARCS)1+1  >  (ARCS^  for  fiRCS±  positive 
or 

4)  (ARCS ) <  (ARCS)^  for  ARCS.,  negative. 

8.  If  these  tests  pass,  the  largest  ARCS  is  dropped,  and  the  iteration 
is  continued,  using  the  three  lowest  values  of  ARCS  with  their  three 
corresponding  values  of  Gamma .  See  the  next  Figure  (18) . 

FINDING  GAMMA  (g)  BY  GEOMETRIC  MEAN  METHOD  This  method  in  an 
improvement  of  Dubey' 3  Method,  found  on  Page  293  of  Techuometrics ,  May  1967. 

The  first  step  is  the  calculation  of  Y^-^  using  data  values. 

The  Welbull  Plot  is  a  plot  of  In  An  [1/(1-F(X))]  -  B  An  [(X-g)/N]. 

Taking  the  Inverse  An  of  each  side  gives; 

y  -  An  [1/<1-F(X)>]  -  [^*)B  -  A(X-g)B 
where  A  -  [1/N]B 

The  constant  "g"  may  be  found  by  writing  3  equations,  which  are  taken 
from  three  data  points  since  there  are  3  unknowns;  g,  B,  and  N. 

Xr  F^X);  X2,  F2(X);  and  X3,  Fj(X) 

Yx  -  A(X1-g)B 
Y2  -  A(X2-g)B 
Y3  -  A(X3-g)B 

Then; 

yiy2  ~  y32  "  a2  [<xr«>B  <V8)B  “  <x3-8)2B] 

Now  if  Y3  ■  /  Y^Yj  ,  the  LHS  ••  0,  and  we  can  solve  for  "g".. 

(xl-g)  (X2-g)  -  (x3-g)2 

X3  X2  -  Xl8  -  X2g  +  g2  -  (X3)2  -  2X3g  +  g2 
g(2X3  -  Xx  -  X2]  -  X32  -  Xx  X2 
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GAMMA  VS  RATE-OF-CHANGE  OF  SLOPE 
(ARCS)  AS  2ND  ORDER  EQUATION 


where:  A  =  ARCS 
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o!=s 


X33(Y22_ill)  +  X11(Y33~Y22'>  *  X22^YU'~Y33) 


Then  the  data  value  of  X,„D 
anti-in  of  ^^(Kao  Values). 


(Kao  Value) 


^MID 

e 


Kao 


is  found  by  taking  the 


Then  the  value  of  Gamma 
in  a  previous  method. 

8  ” 


is  given  by  the  same  formula  Eqn  A,  used 

‘W2-  *1*3 
2XMID~  Xl“X3 


In  the  above  formula ,  it  has  been  found  that  the  use  of  in  to  the 
base  "e"  values  for  the  X’a,  gives  a  more  exact  value  for  g  ,  but  only  if 
the  equation  is  used  repeatedly  until  no  significant  change  occurs. 

This  same  concept  of  repeated  usage  must  be  applied  to  the  equation 
for  "g",  regardless  of  the  method  used,  in  order  to  find  the  true  value  for 
V. 


After  determination  of  Gamma,  all  of  the  3  Weibull  parameters  are 
complete.  Next  will  be  discussed  the  information  obtained  from  the 
Probability  Distribution.  Reliability  is  first. 

Since  there  will  be  3  values  of  each  of  the  parameters,  Gamma,  Beta, 
and  Eta,  the  composite  Reliability  is  given  by  the  sum  of  3  termB. 

Term  1  Term  2  Term  3 


t-g,  B1 

R(t)  -  J  exp  -  [-M'X ]  + 


f-g, 

P  exp  -  [— — ) 
w2 


+  Q  exp  -  [- 


t-g. 


where  g3  >  g2  >  gp  and  J  +  P  +  Q  -  1, 
where  term  3  ia  set  equal  to  Q  for  t  <  g^ 
term  2  is  set  equal  to  P  for  t  <  gj 

J  ■  Percentage  of  Data  Points  in  Burn-In  Phase 
P  -  Percentage  of  Data  Points  in  Catastrophic  or  Random  Phase 


Q  ■  Percentage  of  Data  Points  in  the  Wearout  Phase. 
R(t)  -  1  for  t  <  g3 

A  Plot  is  shown,  Fig.  19. 

Next,  Hazard  Rate  and  Reliable  life. 
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COMPOSITE  HAZARD  RATE.  The  Hazard  Rate  H(t)  la  the  Conditional 
Probability  Density  Function  of  Time  to  Failure,  given  that  the  item  has 
not  failed  prior  to  time  (t) .  In  other  words : 

H(t)  dt  -  P[(t  <  T  <  t+dt) | (T  >  t)] 

It  can  also  be  stated  as  a  ratio  of  Probabilities 


H(t)  dt 


F  f  (t  <  T  < 


P(T  >  t) 


where  H  means  "Intersect". 


H(t)  - 


f±(t) 


probability  density  function 
reliability  function  ~ 


AlBi(t"8i)Bl’1 


A  plot  of  Hazard  rate  for  G.E.  task  A,  Lot  3  is  shown  (Fig.  7)  and  a 
composite  Hazard  Rate  is  shown  in  Fig.  20. 


Other  important  parameters  are  the  Expected  Value  of  Time  to  Failure 
and  the  Variance  of  the  Time  to  Failure. 


The  Expected  Value  (E)  and  Variance  (V)  are  given  for  each  life  phase. 


E  ■  g  +  na  [Gamma  (,1/B^  +  I)] ;  i  -  1,2,3, 

V  -  N'T  {Gamma  (|- +  1)  -  Gamma  (--  +  l)}2  ,  i  -  1,2,3. 

1  ^i  Bi 

RELIABLE  LIFE.  The  last  parameters  found  is  the  Reliable  Life,  RL(C), 
for  a  specified  confidence  level  (C) .  The  formula  is  given  next. 

00 

» 

C  ■  f(t)  dt 
RL 

RL  -  (Gamma)  +  (Eta)  C-flnC) 1/,Bets 

RL  is  found  for  each  component  for  confidence  levels  of  .85,  .90,  and  .95. 
Composite  Hazard  Rate  Evaluation  -  Synthetic  Data 
H  -  0,  t  <  g2  , 

H-J!1  (Nj)"®!  •  (t-g1)Bl-1  ,  g  <  t  <  g2, 

H  -  PB2(N2)_B2  •  (t-g2)B2_1  ,  g2  <  t  <  g3, 
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M»C  10  «  IO  TO  THE  INCH  46  0780 

t  7110  INCHtS  w 


ure  20 


DATA 

H  -  Q  Bj 

-B,  B3-l 

(n3)  "  u-g3>  J 

i  63  '  t* 

J  -  .294 

P  -  .353 

Q  •  .353 

Bx  -  3.75 

B2  -  .621 

B3  -  3.55 

H,  -  1.34 

N2  -  5.1 

»3  " 

r2  ■  10* 

S3  ■  15. 

t 

H 

t 

H 

0-5 

0 

14 

.0470 

6 

.3675 

15 

.0433 

7 

2.46 

16 

.0038 

8 

7.53 

17 

.0223 

9 

16.5 

18 

.0627 

10 

30.9 

19 

.130 

11 

.0797 

20 

.231 

12 

.0613 

30 

3.8 

13 

.0526 

50 

32.6 

6.  PREPARATION  OP  TAER8  COMPUTER  TARE.  Layout  for  BP-2410  Work-Tap* 
for  Welbull  Interchangeable  Part  Program.  '  v~ 

1.  Prapara  BD-CH-47  Work  Tape.  Sava  tha  program  uaad  to  do  this. 

Table  I,  which  followa  Para  10,  shows  tha  location  of  each  Variable  on 
tha  existing  ADP-2410  Tape,  (contact  Mr.  C.P.  Marquardt  of  AVCOM)  and  the 
desired  new  location  on  tha  RD-Ch-47  Work  Tape. 

2.  All  Variables,  except  as  noted,  oust  be  taken  from  tha  ADP-2410 
Tape,  or  calculated  from  data  on  thla  same  tape.  Two  exceptions  are  tha 
Interchangeable  Part  numbers  and  their  associated  Manufacturer's  Code  Humber. 
These  are  obtained  from  Program  25F6BE-41,  "MDR  Component  Taers  Activity  List." 
This  program  la  available  from  Mr.  Tom  Gruenningsr  (phone  extension  2170/ 

2176)  at  AVCOM. 

3.  The  preparation  of  these  two  tapes  rsqulred  only  records  from  the 
ADP-2410  Tape,  that  are  coded  10.  CH-47,  Copy  1  and  6. 

4.  Identical  information  may  be  taken  from  Copy  1  or  Copy  6,  but  NOT 


i 


both.  The  copy  number  is  found  in  column  17.  t>,«  ,  , 

on  me  AU,-2*10  Tape,  in  clumns  35-36.  Any  o^rYo^nnUiflZ  S,~ rd. 

ADP-2410'T,pfLC^w““!l5!  ^  U*'d-  “1S  C"-47  “U1  *«■<•  « 

6.  The  blocking  factor  is  (301.  X  1). 

the  Table  ^ill^u01*  the  RD“CH“47  taPe  muat  be  justified  as  indicated  in 
*ith  Umm.  SpSCeS  thC  l£ft  °f  Right  J ««ified  significant  data 

^-^25 

equal  to ' fn“Jac!:lon  *c£lon  Code  <IAC>.  column  273, 

.?  .  *  This  letter  A  indicates  that  inspection  revealed  thA* 

the  item  was  serviceable  without  needing  repair. 

it  ia  ^uncCi°nal  Group  Coding  must  be  entered  in  columns  58-72. 

t  is  obtalaed  by  reading  the  Federal  Stock  Number  (FSN)  in  column*.  tra 
196  Of  the  ADP-2410  Tape.  with  thls  reK,  ,„rc“ih«  Pubil^noJlSI. 

feSwiindSu  ysa.'s-  ffiussg&'&'r a*  ^ ** -i 

11.  After  completion  of  the  RD-2410  Work  Tape,  separate  it  into 
the  two  tapes  named  above,  based  on  the  Julian  records  dates  as  follows 

Ftiwrhi3  wor^ll8  tape  find  the  most  recent  date  (MDR)  of  4  dates  whether 
ID  (Columns  73-76);  RID  (Columns  77-80);  ODNR  (81-84);  or  ODYR  (85-881 
Prepare.  2  tapes:  RD1-CH-47-CGR  will  contiin  all  record;  whose  hiiS«  if 
the  4  dates  is  between  (MRD-25)  and  MRD-390.  Tape  RDL-CH-4 7! Crnlw  i f  “  -  . 
records  (MRD-390).  The  cut-off  date  (COD)  is  SKJST as (%££)  rL col 
must  be  entered  into  the  RD-2410  Work  Tape,  columns  44-47  *' 

12.  The  following  operations  must  be  applied  to  each  of  these  2  tapes. 

13.  Sort  by  Component  Part  Number,  Location  13-33.  so  that  the 

lowest  number  will  be  processed  first.  the 


Sort." 


14.  Sort  by  Interchangeable  Part  Numbers,  within  the  "Part  Number 


«...  wifi  bfpio^8sbfL:i.prior  overha“l■ ,uch  that  i,». 

The  location  of  P0  is  columns  93-94, 
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TABLE  1 


* 


t 

Location  On 

New  Location 

On  80*2410 

Variable 

Work  Taos 

Bad  Xtea  Model 

133-144 

1-12 

Leave  a  BLANK 
between  GH  and 

47 

L 

Component  Pert  Nuaber  Being  Processed 

Manufacturing  Code  for  the  Component 

145-163 

13-33 

L 

Part  Nuaber  Being  Processed 

166-170 

34-38 

L 

Established  Time  Between  Overhaul  (TBO) 

or  Betabllshed  Finite  Life  (PL) 

238-241 

39-42 

1 

Calender  Group  -  Will  bo  either  MHN 
meaning  recent  or  ea  ”0M  meaning  older 

♦ 

» • 

Not  found  on 
this  tape,  It 
'is  determined  by 
the  aethod  ex¬ 
plained  later » 

In  pars  11. 

43 

/ 

N 

Cut-Off  Julian  Date  (COO) 

Not  on  Tape, 
see  para  11. 

44-47 

R 

Punctlonel  Group  Coda 

Nr  pn  Tape, 
s.  pare  10. 

48-55 

R 

Cosiponent  8erlal  Number 

V 

12-28 

36-72 

L 

first  Installation  Bats  (FID) 

This  date  is 
found  in  columns 

73-76 

R 

a 

299-302,  for  copy' 
6,  under  the  con¬ 
dition  thet  e  copy 
1  does  NOT  exist 
for  this  control 
nuaber. 

If  e  copy  1  dees 
exist,  then  fill 
73-74  with  ssroes. 
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-Variable 

Date  of  Re-Install  (BID) 


/ 


Renova 1  Date,  Whan  Me  Ra-T.natallatien 
Occurs  (ODNR) 


Removal  Data  Whan  Re-Xnstallatloa 
Dote  Occur  (ODYR) 


Unfailed  Flying  Hours 

Number  of  Prior  Overhauls  (PO) 

Hours  Usage  Since  Nev  (USN) 

Hours  Usage  Since  Last  Installed  <ULZ) 


Nev  Location 

Location  On  On  RD-2410 

ADP-2410  Tane  Work  Tana  Justify 

This  date  is  77-80  R 

found  in  column* 

299-302,  for  copy 
6,  under  the  con¬ 
dition  that  a  copy 
1  doea  exist  for 
this  control  number. 

If  a  copy  1  does 
NOT  exist  for  this 
control  number,  fill 
77-80  with  blanks. 

This  date  is  81-84  R 

found  in  299- 
302,  for  copy  1 
under  tha  con¬ 
dition  the  NO 
copy  6  exists  for 
this  control  num¬ 
ber. 

If  a  copy  6  exists, 
fill  81-84  with 
blanks. 

This  date  is  88-88  R 

found  in  299-302, 
for  copy  1  under 
the  condition  that 
copy  6  exists  for 
this  control  num¬ 
ber. 

This  is  not  on  89-92  '  R 

the  ADP-2410 
Tape.  Sea  para|^, 

136-137  *  93-94  R 

123-127  95-99  R 


128-131 


100-103 


R 


.Variable 

f 

Hours  Usage  Sines  Overhaul  (USO)  ' 

Age  Group  (AG),  as  a  function  eft 
PO  located  ADP  tape  136-137 
USH  located  ADP  tape  123-127 
ULX  located  ADP  tape  128-131 
880  located  ADP  tape  132-139 


AG  -  <  _1)(IA  +  25.01) 

(  50) 

Round  off  to  nearest  integer. 
If  AG  is  aero  or  negative) 
aot  It  ■  1.  _ 


Failure  Code  of  Component  (FC) 

264-266 

110-112 

R 

Failure  Detected  During  (FDD) 

270 

113 

R 

Effect  on  Mleelon  (BOH) 

271 

114 

R 

Inspect  6  Action  Code 

273  > 

115 

R 

Component  Noun 

9 

61-84 

116-139 

L 

Standerd  Unit  Price 

255-263 

140-149 

R 

Organisation  Xdent  Code 

a 

43-49 

150-157 

i 

L 

End  Item  Serial  Number)  Tall  Number 

212-221 

158-167 

L 

Flret  Interchangeable  Fart  Number  1 

•  • 

Not  on  ADP- 
2410  Tape. 

Use  program 
number  25F6BE-41) 
"MDR  COMPONENT 
TAERS  ACTIVITY 
List."  Contact 
Ton  Gruennlnger 
of  AVCOM.  (See 
para  3) 

168-186 

L 

! 


Location  On 
ADP-2410  Tape 

132-135 


Now  Location 
On  RD-2410 
Work  Tape 


Not  on  ADP  tape 
It  is  calcu¬ 
lated  aa  in  Ind 
1.  It  is  repeated 
here  i 

"  la 


104-107 

108-109 


JvJAXt. 

A 

S 
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’  Variable 

Manufacturer  Code  Number  1,  Related 
to  Interchangeable  Part  Number  1 

Second  Interchangeable  Part  Number  2 

Manufacturer  Code  Number  2 

Third  Interchangeable  Part  Number  3 

Manufacturer  Coda  Number  3 

Fourth  Interchangeable  Pert  Number  4 

Manufacturer  Code  Nunber  4 

Fifth  Interchangeable  Part  Number  5 

Manufacturer  Code  Number  3 

Number  of  Repaire  on  each  Serial 
Number  <RN) 

End  of  Tape  Indicator 


Location  On 
ADP-2410  Tnne 

New  Location 
On  RD-2410 

Work  Taoe 

iHfit ifr 

Noe  on  ADP-2410 
Tape.  Sec  para  3 

189-193 

L 

Not  on  ADP-2410 
Tape.  See  para  3 

194-214 

L 

Not  on  ADP-2410 
Tape.  See  para  3 

21S-219 

L 

Not  on  ADP-2410 
Tape.  See  para  3 

220-240 

L 

Not  on  ADP-2410 
Tape.  See  para  3  , 

241-245 

L 

Not  on  ADP-2410 
Tape.  See  para  3 

246-266 

L 

Not  on  ADP-2410 
Tape.  See  para  3 

267-] 71  . 

L 

Not  on  ADP-2410 
Tape.  See  para  3 

272-292 

L 

Not  on  ADP-2410 
Tape.  See  pare  3 

293-297 

L 

298-300 

R 

Not  on  ADP-2410 

301 

R 

Tape.  See  para  22 


’  v 


* 


* 


% 
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16 .  Within  each  overhaul  group  (PO) ,  sort  by  Age  Group  (AG) . 

Request  sort  to  allows  the  lowest  Age  Group  (1)  to  be  set  up  to  be 
processed  first. 

Location  of  (AG)  Is  columns  108*109. 

17.  Within  (AG),  sort  by  Serial  Number,  so  that  lowest  Serial 
Number  la  first. 

Location  of  Serial  Number  is  columns  57-72.  This  is  Component 
Serial  Number . 

18.  Within  each  (AG),  count  the  number  of  records  (RN) ,  having  the 
same  serial  number.  Insert  this  number  into  columns  58-60  on  each  (RN) 
record.  Then,  within  each  Age  Group (AG),  arrange  the  records  so  that  those 
records  with  the  lowest  (RN)  will  be  processed  first. 

19.  Find  the  Unf ailed  Items  and  the  Flying  Hours  on  each  one,  as 

follows .  ' 

Within  each  Serial  Number  Group,  find  the  highest  (RID),  which  is 
in  columns  77-80,  and  thfa  highest  (ODNR),  which  is  in  81-84. 

a.  If  this  RID  Is  greater  than  ODNR,  then  this  record  represents 
an  unfalled  Item.  This  is  an  item  that  has  bean  Installed,  but  has  not 
failed.  In  general,  ODNR  will  be  blank  or  serosa,  when  RID  >  ODNR. 

b.  Now  that  we  have  found  an  unfailed  Item,  we  must  find  the 
number  of  hours  logged  on  this  unfalled  item,  by  the  end  of  the  calendar 
period . 

c.  This  procedure  is  as  follows: 

(1)  For  the  serial  number  of  this  unfalled  item,  pick  off 
the  last  installation  date,  (RID),  from  columns  77-80.  Also  pick  off 
the  Tail  Number  from  columns  58-67. 

(2)  With  these  2  Inputs,  search  the  2408-3  tape  file  to  find 
the  hours  logged  (OFH),  on  the  aircraft,  at  the  time  of  installation  (RID) 
of  the  component.  On  the  2408-3  tape,  the  tail  number  is  at  Block  4, 
Columns  14-23,  Card  "A",  and  the  Julian  data  (RID)  is  found  at  Block  11N, 
Card  "C",  Columns  31-34;  the  OFH  is  at  Block  UK,  Card  C,  Columns  16-20. 

(3)  Then,  to  find  the  number  of  hours  (FFH)  on  the  component 
at  the  end  of  the  calendar  period,  (DEF) ,  it  is  necessary  to  search  the 
2408-3  tape  again  for  this  tail  number  and  DEP.  The  DEP  equals  the 'Cut¬ 
off  Date  (COD),  in  columns  44-47  of  the  RD  tape  for  the  RD1-CH-47-CGR 
tape,  and  la  equal  to  COD  laaa  one  year  for  the  RD1-CH-47-CGO  tape. 

(4)  The  procedure  ie  similar  to  the  previous  seerch,  but  (DEP) 
is  used  instead  of  (RID),  and  FFH  replaces  OFH. 

(5)  Then  the  usage  of  hours  for  the  unfailed  item,  (UFH)  is 
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given  below: 


UFH  -  Ft'H  -  OFH 

(6)  Knowing  (UFH),  a  new  record  (an  artificial  2410  record) 
is  generated  for  the  Unfailed  Item.  It  is  a  reproduction  of  the  record 
in  which  the  maximum  (RID)  was  found ,  with  the  following  changes,  "a" 
and  "b". 

(a)  The  newly  found  value  of  UFH  is  put  into  columns 
89-92  of  the  RD  tape. 

(b)  Replace  the  original  ULI  of  columns  100-103  by  9999. 
This  is  necessary  to  Insure  that  the  Unfailed  Items  will  be  processed  at 
the  end  of  each  Age  Group. 

20.  Within  each  Age  Group,  sort  by  Usage  since  last  installation, 
(ULI)  of  columns  100-103.  The  lowest  value  of  ULI  must  be  processed  first. 

21.  Within  each  Age  Group,  sort  by  UFH.  The  highest  UFH  must  be 
processed  last. 

22.  On  the  last  record  enter  "9"  in  column  301,  following  completion 
of  all  sortings. 

23.  Print  a  listing  of  each  tape. 

24.  Store  each  tape  for  future  use,  and  notify  this  office  of  its 
identification  tag,  and  procedure  for  recall. 

CONCLUSIONS  This  report  shows  that  a  test  has  been  designed  which 
will  analyze  the  effect  of  overhauls,  repairs,  modifications  of  design 
(MW0) ,  and  engineering  change  proposals  (ECP) ,  for  interchangeable  items, 
on  Time  Between  Overhaul  (TB0),  Reliable  Life,  Reliability,  Burn-In  Time, 
and  Hazard  Rate.  These  parameters  will  adjudge  contractor  compliance. 

An  assumption  of  Weibull's  family  of  distributions  is  made. 

A  second  assumption  is  that  items  having  the  same  number  of  overhauls 
and  number  of  repairs,  and  age  at  the  time  of  installations,  will  have  no 
more  than  three  drastic  changes  of  failure  rates.  These  sudden  changes, 
characterize  the  three  life  phase:  1)  Burn-In  Time,  2)  Random  Failure 
Phase,  and  3)  Wear  Out  Phase. 

The  program  further  tests  the  adequacy  of  the  Army  Reporting  System 
(TAERS). 

The  major  mathematical  contribution  is  the  development  of  formulas 
for  the  Weibull  Probability  Distribution  parameters.  As  a  result  of  this 
development,  it  will  no  longer  be  necessary  to  laboriously  plot  data, 
repeatedly  in  a  trial  and  error  program,  to  achieve  graphical  results. 

The  entire  Fortran  program  is  being  computerized  by  the  Research  and 
Development  Division  of  RD&F.  Directorate,  and  the  Special  Studies  Office 
of  AVCOM.  The  Automatic  Data  Processing  Office  of  AVCOM  is  preparing 
the  data  tape  of  the  TAERS  information. 
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A  TECHNIQUE  FOR  INTERPRETING  HIGH  ORDER  INTERACTIONS 
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INTRODUCTION .  The  detection  and  Interpretation  of  high  order  interactions 
has  been  quite  difficult  in  the  recent  past.  This  has  been  primarily  due  to 
the  large  number  of  calculations  required  to  evaluate  all  of  the  single-degree- 
of-freedom  contrasts  in  a  typical  experiment.  Hence,  short-cut  formulae  were 
used  which  often  permitted  significant  high  order  interactions  to  slip  by 
undetected. 


The  recent  advent  of.  very  high-speed,  large-core,  third-generation  computers, 
together  with  the  availability  of  good  statistical  packages  has  made  adequate 
evaluation  of  interactions  feasible.  The  only  remaining  aspect  of  the  problem 
and  the  topic  of  this  paper  is  the  development  of  a  logical  and  systematic 
procedure  for  ferreting  out  the  essential  pieces  of  information  which  will  lead 
to  a  valid  interpretation  of  interaction. 

..CONCEPT,  The  proposed  procedure  for  isolating  and  interpreting  high-order 
Interaction  is  based  upon  a  sequential  elimination  of  the  factor  levels  which 
are  not  primarily  involved  in  the  interaction,  A  least  squares  program  is  used 
to  fit  coefficients  to  a  complete  set  of  orthogonal  contrasts  among  the  treat¬ 
ment  levels  of  the  factorial.  In  addition,  similar  analyses  are  developed  on 
subsets  of  the  data.  These  subsets  are  those  data  within  a  given  level  of 
each  factor  of  the  entire  experiment. 


The  computer  output  from  a  typical  least-square3  regression  program  is 
normally  displayed  in  ANOVA  table  form.  Each  slngle-degree-of-freedom  contrast 
is  listed  with  an  F  test  of  its  significance.  These  F  tests  are  studied  to 
determine  which  factor  levels  are  involved  in  a  particular  interaction.  First, 
the  complete  analysis  is  scanned  for  the  highest  significant  interaction  and 
also  for  low  order  interactions  which  are  not  components  of  higher  order  inter¬ 
actions.  Subsequently,  the  subset  analyses  are  studied  to  find  which  factor 
levels  contribute  to  the  high  order  interaction.  Once  the  contributing  factors 
are  determined,  the  interaction  can  be  resolved  by  graphical  means. 

FIELD  AND  TREATMENT  DESIGNS  OF  THE  EXPERIMENT.  The  procedures  and  techniques 
discussed  herein  can  be  readily  adapted  to  a  wide  variety  of  field  and  treat¬ 
ment  designs.  For  purposes  of  example,  however,  a  factorial  experiment  in  a 
randomized  complete  block  field  design  is  used.  To  put  the  example  into  context, 
the  analysis  of  variance  table  is  given  in  Table  1.  The  64  treatments  of  this 

2  2 

design  are  those  of  a  2  x  4  factorial  treatment  design.  The  2-level  factors 
are  temperature  (T) ,  25°C,  28°C  and  relative  humidity  (H),  20%,  80%,  The  4-Ievel 
factors  are  age  (A),  28,  50,  70,  93  hours  and  populations  (P),  V,  F,  C,  W  of 
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Drosophila  melanica.  Population  V  is  from  Norfolk,  Virginia,  F  from  Forest 
Park,  Missouri,  C  from  Cliff,  New  Mexico  and  W  from  Walnut  Creek,  Arizona. 

The  yield  variable  throughout  this  example  is  the  respiratory  rate  of  samples 
of  ten  Drosophila  melanica  pupae . 

Table  1.  Factorial  Arrangement  of  Treatments  for  a  Four-Factor 
Design  (T,H,P,A) . 


Degrees  of 

Degrees  of 
Freedom  for 

Source 

Freedom 

the  Example 

Blocks  (B) 

b-1 

3 

Temperature  (T) 

t-1 

1 

Humidity  (H) 

h-1 

1 

TH 

(t-1) (h-1) 

1 

Population  (P) 

(p-1) 

3 

TP 

(t-1) (p-1) 

3 

HP 

(h-1) (p-1) 

3 

THP 

(t-1)  (h-1)  (p-1) 

3 

Age  (A) 

a-1 

3 

TA 

(t-1) (a-1) 

3 

HA 

(h-1) (a-1) 

3 

THA 

(t-1) (h-1) (a-1) 

3 

PA 

(p-1) (a-1) 

9 

TPA 

(t-1) (p-1) (a-1) 

9 

HPA 

(h-1) (p-1) (a-1) 

9 

THPA 

(t-1) (h-1) (p-1) (a-1) 

9 

Error 

(b-1) (thpa-1) 
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Single  degree  of  freedom  contrasts  must  be  developed  for  the  main  effects 
and  interactions  of  the  factorial  model.  First,  contrasts  are  defined  among 
the  four  main  effects  and  the  blocks.  For  the  2-level  factors  the  contrast 
is  simply  +1  for  the  high-level  and  -1  for  the  low-level.  For  the  4- level 
factors,  however,  three  contrasts  need  to  be  defined  for  each  factor.  For 
instance,  our  example  has  four  populations,  two  of  which  are  from  the  arid 
Southwest  and  two  from  the  forested  eastern  half  of  the  continent.  Since 
three  orthogonal  contrasts  are  needed  and  even  though  any  set  will  suffice  for 
determination  of  sums  of  squares,  a  logical  set  of  contrasts  might  be: 

(1,  1,  -1,  -1) 

(1,  -1,  0,  0) 

(0,  0,  1,  -1) 
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“hore  the  first  vector  ccr.traato  L'nc  nuu-atlu  populations  witn  tne  arid  ones, 
the  second  contrasts  the  non-arid  populations  and  the  third  contrasts  the  arid 
populations.  These  contrasts  are  represented  respectively  as  PI,  P2  and  P3. 

When  meaningful  logical  contrasts  are  not  obvious,  orthogonal  polynomial 
coefficients  can  be  used  with  adequate  results  since  any  set  will  produce  the 
correct  sum-of -squares.  For  example,  the  following  set  of  vectors  weie  used 
for  the  blocks  contrasts  as  well  as  for  age  contrasts: 

(3,  1,  -1,  -3)  linear 
(1,  -1,-1,  1)  quadratic 
(1,  -3,  3,  -1)  cubic, 


where  the  vectors  are  represented  by  Bl,  B2  and  B3  for  blocks  and  by  Al,  A2 
and  A3  for  ages. 

Orthogonal  single-degree-of -freedom  interaction  contrasts  can  be  readily 
developed  by  taking  all  possible  products  of  the  already  defined  main-effect 
contrasts.  For  example,  the  3  x  3  ■  9  PA  interaction  contrasts  are  found  by 
multiplying  the  elements  of  each  of  the  three  contrast  vectors  for  P  with 
each  of  the  three  contrast  vectors  for  A.  This  procedure  can  be  extended 
directly  to  the  higher  order  interactions  as  well;  £.&•,  the  nine  TPA  contrasts 
may  be  developed  by  multiplying  the  T  contrasts  with  each  of  the  newly  found 
PA  contrasts.  Of  course,  interaction  contrasts  with  blocks  do  not  have  to  be 
found  because  in  the  linear  interaction  model  they  all  have  expectation  of 
zero.  Hence  they  are  valid  error  components  and  they  can  be  evaluated  by 
subtracting  the  blocks  and  treatments  sum  of  squares  from  the  total  sum  of 
squares . 

Once  the  orthogonal  contrasts  have  been  defined,  they  can  be  used  as 
Independent  variables  in  a  multiple  regression  analysis.  Most  statistical 
packages  Include  a  least-squares  program  which  will  accomplish  the  necessary 
calculations . 

The  abbreviated  ANOVA  table,  Table  2,  Is  the  result  of  a  regression  analysis 
of  respiration  rates  upon  the  66  orthogonal  contrasts  (3  for  blocks  and  63 
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for  the  treatments  of  the  2x4  factorial) .  Only  those  contrasts  with  F 
values  greater  than  4.00  are  tabulated.  The  large  array  of  significant  inter¬ 
action  is  particularly  alarming,  especially  when  a  4-factor  interaction  is 
highly  significant.  The  first  reaction  is,  "Who  missed  a  decimal  point  in  a 
couple  of  data  cards?"  Since  this  is  not  the  case,  an  Interpretation  is 
required.  The  various  interactions  can  be  broken  into  three  general  categories 
for  discussion.  The  first  group  is  composed  of  the  interactions  which  are  not 
components  of  the  highest  order  interaction,  the  second  is  the  highest  order 
interaction  itself  and  the  third  group  is  composed  of  the  lower  order  inter¬ 
actions  which  are  components  of  the  highest  order  interaction. 


Table  2. 
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Significant*  Contrasts  for  the  Complete  2  x  4 
Experiment. 


Main  Effects 


Interactions 


Contrasts 


F  Statistic 


T 

H 

PI 

P3 

A1 

A2 

A3 

TA1 

TA2 

TA3 

HP2 

P1A1 

P3A1 

THP1 

THP2 

THA3 

HP2A1 

THP2A1 


257. VI 
6.16 
6.45 
30.02 
706.45 
452.66 
8.81 
44.64 
11.44 
6.23 
8.98 
25.93 
7.70 
11.05 
8.15 
12.77 
8.68 
11.73 


^Contrasts  with  F  less  than  4.00  are  not  tabulated. 

Interactions  In  the  first  group  such  as  P3A1,  which  is  not  a  component  of 
THP2A1,  can  be  easily  resolved  by  graphical  techniques.  Considering  that  the 
contrast  P3  is  the  comparison  of  Cliff  vs.  Walnut  and  that  P3A1  does  not  involve 
T  or  H,  the  mean  respiration  rate  averaged  over  all  temperature  and  humidity 
levels  was  determined  for  the  eight  combinations  of  Cliff  and  Walnut  with  age. 
These  are  plotted  in  Figure  1  with  respiration  rate  on  the  ordinate  and  age 
on  the  abscissa.  While  the  lines  for  Cliff  and  Walnut  are  essentially  parallel 
at  the  younger  ages,  they  do  diverge  considerably  at  age  93.  This  divergence, 
of  course,  Is  what  we  detect  by  the  significant  F  for  P3A1.  Thus  we  have 
resolved  P3A1. 

A  very  basic  part  of  the  interpretation  of  high  order  interactions  such  as 
THP2A1  is  identification  of  the  particular  levels  of  the  effects  which  are 
primary  contributors  to  the  Interaction.  To  aid  in  detecting  theae  critical 
levels,  sub  analyses  were  performed  on  the  3-factor  factorials  within  each 
level  of  each  main  effect.  For  Instance,  an  analysis  was  performed  on  the 
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Taklna  each  factor  in 


!  ;  ^  fnstcris!  •*»»»  fMm*r«hir«  la  28*C. 

turn  wo  aaa  tha  following: 

1.  Within  temperature  level*  tha  25*C  data  exhibit  only  two  2-factor 
interaction*  while  the  28*C  data  exhibit  eight  2  and  3-factor 
Interaction*.  Table  3. 


Table  3.  Significant  Contraate  for  the  3-Factor  Experiment  within 
Temperature  Level*. 


Main  Effect* 


Intaractlona 


Contraat 

F  Statlotlc 

25*C 

28*C 

PI 

4.04 

P3 

7.79 

27.63 

A1 

191.31 

634.23 

A2 

293.79 

183.57 

A3 

17.11 

DPI 

15.64 

HP2 

19.64 

HA1 

5.84 

HA3 

12.59 

P1A1 

8.89 

19.93 

P1A2 

4.58 

P3A1 

5.98 

HP1A1 

4.44 

HP2A1 

23.27 

2.  Within  humidity  lavels  we  aaa  that  both  level*  exhibit  a  considerable 
number  of  Interactions.  Table  4. 


Table  4.  Significant  Contraate  for  tha  3-Factor  Experiment*  within 
Humidity  Laval*. 


Main  Effect* 


Contraate 

F  Statistic 

20Z  E.H. 

80X  E.H 

T 

184.11 

103.20 

PI 

14.17 

P2 

17.59 

P3 

17.73 

14.28 

A1 

560.85 

253.30 

A2 

314.44 

186.33 
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Table  4.  (Continued) 


Interactions 


Contrasts  F  Statistic 


20%  R4L 

88%  P.  H 

A3 

14.56 

TP1 

8.51 

4.10 

TP  2 

5.02 

TA1 

52.38 

9.28 

TA2 

11.57 

TA3 

14.65 

P1A1 

25.40 

7.16 

P2A1 

6.11 

P3A1 

4.85 

TP1A1 

5.39 

TP2A1 

5.74 

6.44 

3.  Within  the  populations  we  see  that  the  Cliff  and  Walnut  strains 
exhibit  only  2 -factor  interactions  while  the  Virginia  and  Forest 
Park  strains  are  both  involved  in  3-factor  interactions.  Table  5. 


Table  5.  Significant  Contrasts  for  the  3-Factor  Experiments  within 


Populations. 

Contrast 

Forest 

F  Statistic 

Virginia 

Park 

Cliff 

Walnut 

T 

259.26 

29.89 

81.89 

74.67 

H 

9.63 

A1 

886.49 

129.71 

163.06 

119.71 

A2 

422.26 

77.75 

108.71 

125.47 

A3 

7.93 

4.29 

7.22 

TH 

6.92 

4.86 

TA1 

61.21 

5.71 

10.74 

10.63 

TA2 

4.26 

6.44 

4.86 

HA1 

5.24 

HA2 

5.69 

THAI 

9.65 

THA3 

7.53 
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4. 


Within  the  four  age  levels  we  find  that  ages  28,  50  and  70  are  not 
Involved  in  3-factor  interactions  while  the  93  hour  data  have 
■lenlflrent  3-f«p.tnr  Infnrurt'lnni.  Table  fi. 

Table  6.  Significant  Contrasts  for  the  3-Factor  Experiment  within 
Age  Levels. 

Contrast  F  Statistic 


28  hr s. 

50  hrs. 

70  hrs. 

93  hrs. 

T 

7.13 

57.67 

151.93 

79.95 

H 

6.94 

PI 

26.64 

P3 

7.30 

19.80 

TH 

6.96 

6.92 

HP  2 

11.64 

THF1 

8.40 

THP2 

15.80 

Thus,  it  appears  Chat  attention  should  be  focused  upon  the  28'C,  93  hour 
data  from  the  Forest  Park  and  Virginia  strains.  The  interacting  93  hour  data 
and  the  non-interacting  50  hour  data  are  demonstrated  graphically  in  Figure  2. 
It  la  apparent  from  these  two  graphs  that  the  Forest  Park  and  Virginia  strains 
respond  differently  at  the  two  humidity  levels  when  the  temperature  is  at 
28*o.  Conversely ,  whan  the  temperature  Is  at  25*C  the  response  curves  are 
parallel.  Thus,  we  have  resolved  the  4-factor  interaction. 


The  analyses  within  age  levels  indicated  that  the  three  youngeet  ages 
were  Involved  in  only  a  very  few  interactions .  Thus,  we  decided  to  reanalyse 
2 

the  data  as  a  2  x  3  x  4  factorial  by  eliminating  age  93  from  the  analysis. 

The  results  of  this  snalysis,  given  in  Table  7,  are  quite  enlightening.  Only 
two  interactions,  THA2  and  TA1,  are  re, illy  significant.  The  third  interaction, 
THP1,  has  an  I  of  only  4.13  which  is>  right  at  the  critical  value  of  F  and  will 
be  ignored.  We  also  note  that  the  lower  order  interaction,  TA1,  la  a  component 
of  THA2.  The  most  striking  result  of  this  2^  x  3  x  4  analysis  is  the  complete 
disappearance  of  the  4-factor  interaction  which  verifies  that  its  significance' 
is  in  fact  due  to  a  failure  of  the  93  hour  data  to  conform  with  the  data  from  all 
three  other  ages. 

Table  7.  Significant  Contrasts  for  the  4-Factor  Experiment  after  Omitting 
Data  from  Age  Level  93. 


Main  Effects 


Contrasts 

T 

H 

F3 


F  Statistic 
178.40 
4.20 
12.80 
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FIGURE  2.  GRAPHICAL  DISPLAY  OF  THE  4 -FACTOR  INTERACTION  THP2A1. 


TREATMENT 

COMBINATION 

MEAN 

28°C, 

20& 

v, 

93hr 

1.31 

28°C, 

805^, 

v. 

93hr 

1.23 

28°C, 

20*, 

F, 

93hr 

1.07 

28°CJ 

80& 

F, 

93hr 

1.51 

TREATMENT 

COMBINATION 

MEAN 

25°C,  20*, 

V, 

50hr 

0.50 

25°C,  80*. 

V, 

50hr 

0.45 

25°C,  20*, 

F, 

50hr 

0.56 

25°c,  so*, 

Fj 

50hr 

0.50 

INTERACTING  GRAPH 


0  20  80 


RELATIVE  HUMIDITY  ($) 


RELATIVE  HUMIDITY  {%) 
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Table  7.  (Continued) 

Contrasts 
A1 
A2 

Interactions  TAl 

THP1 
THA2 

5 

The  2x3x4  analysis  also  points  out  an  interaction  of  the  third  type; 
natsely  a  lover  order  interaction,  THA2,  which  is  a  component  of  a  higher  order 
Interaction,  THP2A1.  Because  the  4-factor  Interaction  involved  age  93  and 
'■•cause  the  interaction  THA2  is  significant  throughout  the  remainder  of  the 
experiment ,  we  should  determine  the  implications  of  THA2 .  Populations  ars 
not  Involved  in  THA2  so  we  can  plot  (Figure  3)  the  respiration  rate,  averaged 
over  populations,  against  age  for  Che  four  combinations  of  temperature  and 
humidity.  The  two  28*C  curves  are  similar  whereas  the  two  25°C  curves  are 
quite  divergent  from  each  other  and  also  from  the  28*C  curves.  This  figure 
quite  adequately  demonstrates  the  response  function  for  the  three  youngest 
ages.  Because  the  interactions  with  population  were  nonsignificant  in  the 

22  x  3  x  4  analysis,  wa  can  infer  that  the  response  curve  of  each  population 
is  similar  in  shape  to  the  reaponae  curves  in  Figure  3. 

In  conclusion,  a  procedure  is  outlined  for  isolating  high-order  inter¬ 
actions  and  developing  their  logical  intarpretatlon.  Procedures  are  also 
outlined  for  identifying  end  Interpreting  two  types  of  lover  order  interaction, 
thosa  which  are  components  of  the  higher  order  interaction  end  those  which 
era  not.  The  analysis  is  based  on  a  least  squares  fit  to  slngle-degree-of 
freedom  contrasts  and  a  aubaaquent  graphical  display  of  the  significant  contrasts. 
The  key  to  the  method,  however,  is  the  ability  to  isolate  the  critical  factors 
by  taking  advantage  of  the  computers  ability  to  easily  and  inexpensively  reanalyze 
various  subsets  of  the  data. 
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Wood-Ivey  Systems  Corporation 
Winter  Park,  Florida 

ABSTRACT .  This  paper  presents  a  simplified  method  for  determining  an 
optimum  experimental  configuration  that  most  nearly  satisfies  the  experimenter's 
requirements. 

Although  the  LaGrangian  multiplier  method  can  be  used  to  find  a  specific 
experimental  design  with  nearly  minimum  variance  subject  to  cost  restrictions, 
the  experimenter's  flexibility  is  limited  and  the  calculations  are  laborious. 

By  use  of  the  simple  computer  program  given  in  this  paper,  the  objective  and 
cost  functions  can  be  readily  evaluated  for  numerous  feasible  combinations. 

The  distinct  advantage  of  the  latter  technique  is  that  the  experimenter  is  able 
to  choose  that  design  which  most  nearly  fits  his  experimental  needs. 

INTRODUCTION .  The  success  or  failure  of  an  experiment  is  normally 
determined  during  the  planning  phase  of  the  research.  Success  of  a  particular 
experimental  design  is  essentially  dependent  on  the  design's  ability  to  teBt 
adequately  certain  hypotheses  or  to  estimate  certain  effects  accurately.  This 
paper  considers  efficient  experimental  designs  from  the  standpoint  of  optimum 
choice  of  factor  levels  once  the  basic  design  type  has  been  determined.  An 
exemplary  problem  is  solved  with  the  aid  of  a  very  simple  computer  program. 

The  basic  design  type,  such  as  a  completely  random  design,  a  randomized 
complete  block  design,  or  a  split-plot  design,  is  determined  to  a  large  extent 
by  design  restrictions.  For  instance,  you  can't  change  cameras  in  a  reconnais¬ 
sance  aircraft  during  flight  nor  can  pilots  be  switched.  Similarly,  it  is  not 
usually  possible  to  completely  randomize  aircraft  speeds  or  altitudes  due  to 
obvious  restrictions,  both  legal  and  technical.  Because  the  basic  design  type 
is  usually  prescribed  in  one  way  or  another,  we  will  restrict  our  attention 
to  selection  of  the  number  of  levels  of  each  of  the  component  factors  for  a 
split-plot  design.  Although  a  large  number  of  combinations  are  essentially 
equivalent,  a  very  poor  design  may  often  be  developed  if  the  experiment  is  not 
adequately  planned.  Since  many  experimental  designs  cannot  be  easily  changed 
during  the  conduct  of  the  experiment,  great  care  needs  to  be  exercised  during 
the  design.  If  a  change  is  made  late  in  the  experimental  process,  a  considerable 
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"■etc  cf  effort  eud  of  cffluleucy  ia  usually  experienced. 

The  mere  feet  that  we  are  concerning  ouraelvea  with  an  optimum  design 
auggesta  that  some  trade-offs  must  be  made.  Usually  these  trade-offs  are 
precision  versus  the  cost  of  performing  the  experiment.  The  objective 
functions  with  which  we  are  normally  concerned  are  not  expressed  In  common 
vnlts.  This  considerably  complicates  matters  when  we  get  to  the  point  where 
we  wish  to  solve  for  an  optimum  solution.  The  objective  functions  of  design 
efficiency  are  normally  expresaed  In  terms  of  variance  components  and  the 
design  parameters.  The  cost  function,  on  the  other  hand,  la  typically  a 
function  of  dollar  or  hour  cost  and  of  the  design  parameters.  An  Ideal  design, 
of  course, would  be  one  which  minimised  the  cost  function.  Naturally  some 
compromise  must  be  made.  Hence,  a  combination  of  design  parameters  must  be 
found  that  gives  near  minimum  variance  (or  at  least  a  tolerable  variance)  for 
the  smallest  cost  consistent  with  design  needs. 

METHOD.  To  develop  a  desirable  design,  several  essential  steps  must  be 
followed.  For  purpose  of  this  presentation,  we  will  first  outline  a  systematic 
procedure  in  seven  major  steps.  Subsequently,  we  will  follow  this  procedure 
through  to  completion  with  an  example  from  reconnaissance  research.  The 

essential  steps  are: 

1.  State  the  hypotheses  to  be  tested  and  identify  the  effects  to  be 
estimated. 

2.  Develop  a  linear  mathematical  model  rf  the  yield  variable  In  terms 
of  the  factors  of  the  design.  Of  course,  this  model  must  be  such 
that  It  will  provide  teat  statistics  capable  of  testing  the  hypotheses 
stated  In  Step  1.  Furthermore,  It  must  also  provide  estimators  for 
any  affects  that  must  be  estimated. 

3.  Develop  an  ANOVA  table  based  on  the  model.  Work  out  the  expectations 
of  the  mean  squares. 

4.  Develop  an  objective  function  for  each  hypothesis  that  Is  to  ba  tastad 
and  one  for  each  effect  that  la  to  ba  estimated.  These  functions  will 
typically  be  functions  of  the  design  parameters  and  of  tha  variance 
components.  Hence,  a  priori  estimates  of  the  variance  components  must 
ba  developed.  Often  these  estimates  can  be  derived  from  almllar 
prevloualy  performed  research  projects. 

5.  Develop  a  coat  function  based  on' the  project's  design  parameters  and 
their  respective  unit  costs. 

6.  Solve  the  set  of  objective  functions  for  an  optimum  solution.  Since 
the  functions  are  antagonistic  and  require  Integer  solutions,  a 
computerised  evaluation  of  the  objective  functions  for  feasible 
combinations  of  design  parameters  la  recommended. 

7.  Select  the  combination  of  design  parameters  that  most  nearly  minimises 
the  objectives  functions  within  the  budgetary  restrictions  of  the 
project. 

An  example  from  aerlel  photographic  raconnaiasance  will  be  ueed  to  demon¬ 
strate  thla  procedure.  Only  the  identitlee  of  the  aircraft  and  Its  cameras 
have  been  changed  for  security  purposes.  Oh  yas,  the  velocities  and  data  are 
also  fictitious  for  the  seme  reason.  Several  "sophisticated"  aircraft  will  be 
available  to  fly  photographic  missions  during  this  research  period.  Due  to 
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commitments  we  have  s  sorties  assigned  to  this  project,  where  a  is  somewhat- 
negotiable.  One  serious  restrictions,  however,  is  that  a  particular  aircraft 
cannot  be  guaranteed  for  a  fixed  number  of  sorties.  Thus,  sorties  will  be 
considered  as  blocks.  All  aircraft  will  be  fitted  with  an  "Advanced,  Model 
Al-Mod  3"  camera  for  this  test.  Each  sortie  can  reliably  produce  12  images 
of  the  target  complex.  We  are  Interested  in  evaluating  image  quality  at  four 
different  velocities,  namely  alow,  fast,  very  fast  and  full  throttle  (all 
after  burners  on).  The  latter  one  for  obvious  reasons.  Hence,  the  four  levels 
of  speed  are  our  treatments  of  primary  interest.  Since  12  images  can  be 
secured  during  a  sortie,  the  4  velocities  will  be  replicated  3  times  within 
each  sortie.  The  resulting  photographic  images  are  to  be  evaluated  by  p 
photographic  interpreters.  Each  of  the  p  interpreters  will  be  required  to 
evaluate  the  images  d  times  -  on  different  days,  of  course,  to  avoid  an  among 
successive  -  evaluations  variance  of  zero. 

In  this  example,  a  number  of  tests  of  hypotheses  are  of  interest.  However, 
to  simplify  the  presentation,  we  will  only  consider  efficiency  for  the  hypothesis, 
Tj  -  Tj'  *  0.  The  basic  design  structure  that  we  have  just  outlined  will 

provide  a  test  statistic  for  this  hypothesis.  Each  of  the  individual  measure¬ 
ments  upon  the  Imagery  can  be  described  by  the  model: 

Yljk»  "  “  +  »i  +  'j  +  +  «ljk  +  h  +  ’‘ilk*  +  'ijkta 


1  ■  1,.. .,  s 
J  ■  1»  2,  3 ,  4 
k  ■  1,  2,  3 

1-1 . . 

m  *  1,  ...,  d. 


where  u  is  a  constant,  g.  is  the  effect  of  the  ith  sortie  where  the  6.  *  NID 

2  1  1 
(0,  ),  and  t  Is  the  effect  of  the  jth  velocity  when  -  0.  St^  is  an 

additional  effect  due  to  the  specific  combination  of  the  jth  speed  during  the 

2 

ith  sortie  where  the  d?  ^  ^  NID(0,  °gT).  a  sampling  error  within  the 

same  sortie-velocity  combination  where  the  NID(0,  is  the  effect 

due  to  the  ith  interpreter  where  the  y.  NID(0,c  ^) .  yd...  is  an  effect  due 

^  Y  1  j  K  X  2 

to  the  way  the  ith  interpreter  evaluates  the  ijkth  image  where  NIDtO.o^^  ) 

£ijkim  iB  a  samPlin8  error  among  consecutive  readings  of  the  ith  interpreter 
on  the  ijkth  image  where  e ijitAm  %  NID(0»ae^) •  *n< i  the  ANOVA  would  appear  (in 
abbreviated  form)  as  in  Table  1. 


The  first  term  of  the  model  gives  us  a  line  for  the  overall  mean  with  one 
degree-of-freedom.  Upon  closer  scrutiny,  it  is  quite  obvious  that  the  next 
four  terms  of  the  model  define  wliat  is  commonly  referred  to  as  a  whole-plots 
analysis  in  a  split-plot  experiment  with  the  s  sorties  as  blocks  and  v  ■  4 
velocities  as  treatments.  The  model  indicates  that  interaction  between  sorties 
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and  velocities  is  possible  and  that  each  sorties  treatment  combination  is 
replicated  r  times.  In  this  case,  however,  r  -  3  since  v*r  -  12.  Normally, 
or  course,  the  block  by  treatment  interaction  is  assumed  to  be  zero.  In 
this  case,  however,  sorties  cannot  be  considered  to  be  true  blocks  since 
they  are  really  another  treatment;  moreover  in  the  real  world,  we  wish  to 
estimate  the  component  a^2.  These  first  five  lines  are  thus  the  whole-plot 

part  of  the  analysis.  The  photo  interpreters  are  the  split-plot  treatments 
and  this  leads,  in  turn,  to  the  three  line  split-plot  analysis  of  photo 
interpreters ,  photo  interpreters  by  whole-plot  treatments  interactions,  and 
sampling  error  among  the  split-plot  units. 


To  simplify  the  discussion  somewhat,  we  restricted  our  attention  to 
design  optimisation  for  the  test  of  the  hypothesis,  Tj  ~  Tj  '  n>1'B  te8t 

can  be  madu  using  a  t-test;  therefore,  it  is  obvious  that  an  optimum  design 
for  this  test  can  be  achieved  by  minimizing  the  variance  of  the  difference 
between  the  x  treatment  means.  Now  the  estimator  for  the  contrast  t  -  r j 1 


u  v-V  - 


Y .  j * . . . )  where  the  dot  indicates  summation  over 


that  subscript.  The  estimator  x^  -  x , ' ,  written  in  terms  of  the  model  is 


t<V.  „_L_ 

J  J  «pd 


[arpd  <t4  -  V)  +  rpd  I  0^  +  pd  !  I  <«1Jk  - 


'i  J 

“  Y«4 


i-1 


ijkt  r  ij’ki 


a  r  p 

+  d  I  l  l  (yd 

i-1  k-1  4-l 

Now  the  variance  of  1  Is: 

x\ 

V  (T  -  T  «) 

J  J  s-r^p^d 


s  r  p  d 

)  +  l  l  l  l  it 

i-1  kml  i-1  m-.l 


i-1  k-1 


ijkim  "  'ij’fcJha*1, 


2TT2  I2r2P2d2*  °8t2  +  2p2<i2«  °62  + 


2  2  2 

24  grp  +  2  srpd  ]  •  Or  rewriting  Into  the  form  of  the  EMS  of  the  ANOVA 

table  V  (rj  -  T^)  -  ^  [<jZ  +  dc^2  +  pd  oZ  +  rpd  ~  o^2] .  If  we  knew 

2  2  2  2 

the  actual  values  of  the  variance  components  oe  ,  o^  , -a&  ,  and  o^  we  could 

write  out  one  of  the  objective  functions  that  we  wish  to  minimize.  A  priori 
estimates  must  be  found,  by  argument  if  necessary,  to  evaluate  the  objective 
function.  For  many  problems  estimates  can  bs  derived  from  previously  conducted 
experiments. 


The  other  objective  function  that  we  now  have  to  develop  is  the  cost 
function.  This  can  also  be  developed  from  the  mathematical  model  in  terms  of 
the  design  parameters  e,  r,  p,  d,  and  v  plus  tho  actual  unit  coot  of  each 
additional  level  of  the  factors.  Therefore,  let: 


■  Cost  for  each  sortie 
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Cj  ■  Cost  for  each  photo  interpreter,  and 
Cj  -  Cost  of  replicate  photo  interpretations. 

Furthermore,  since  r  and  v  are  fixed,  the  cost  for  the  experiment  will  be  sC^ 
+  pCy  +  svrpd  ,  The  two  functions  which  we  wish  to  minimize  are  thus: 

fl  "  slid  (ae  +  d  CY 6  +  Pd  A  +  4  Pd  °B?  >’ 

and 

»  s  +  p  +  12  spd  C^. 

The  objective  functions  f^  and  are  antagonistic  because  f^  is  a 

decreasing  function  of  the  design  parameters;  whereas,  f ^  Is  An  Increasing 

function.  A  method  of  evaluating  these  functions  is  obviously  needed.  An 
often  used,  however  quite  unsatisfying,  method  is  the  LaGranglan  Multiplier 
method  by  which  the  variance  function,  f^,  is  minimized  subject  to  the  coat 

function,  f0,  being  equal  to  some  fixed  coat.  Some  criticisms  of  this  method 
are:  * 

(1)  It  yields  non-integer  solutions. 

(2)  It  usually  requires  ths  solution  of  very  difficult  equations. 

(3)  It  does  not  reveal  nearly  optimal  solutions. 

(4)  It  does  not  reveal  the  solutions  with  considerably  smaller  variance 
at  only  a  moderate  Increase  in  cost. 

A  very  simple  computer  program  provides  a  means  for  finding  an  optimum 
integer  solution.  In  fact,  all  of  the  previously  mentioned  criticisms  of  the 
LaGranglan  multiplier  method  are  avoided.  The  only  apparent  difficulties  with 
this  computerized  method  appear  to  be: 

Cl)  It  requires  some  programming. 

(2)  The  computer  output  must  be  scanned  visually  to  find  the  design 
parameter  combinations  that  most  nearly  satisfy  the  objective  functions. 

The  latter  difficulty,  of  course,  could  be  a  very  problematical  task  if 
several  antagonistic  objective  functions  are  to  be  evaluated  simultaneously. 
Even  this  task  Is  not  too  difficult  if  Isobars  are  drawn  in  various  colors  on 
the  computer  output  sheets. 

An  example  of  a  simple  complete  Fortran  (WATFOR)^  program  which  will 
evaluate  our  example  is: 


HfATFOR  is  a  Fortran  IV  compiler  written  for  IBM  360  computers  by  the  University 
of  Waterloo,  Waterloo,  Ontario. 
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DIMENSION  OUT  (8),  OUT2  (8) 

1  nuin  r*e  nr*  /n\  »»»  *tnr\  »m  trnm 

^  M  MIMf  |  VW  |  Vft  |  V«*  |  *  Mf  V  VM  |  T  M  |  Ml* 

DO  1  1  •  1,  10 
7  FORMAT  C’l') 

WRITE  (3,  7) 

DO  1,  L  -  1,  10 

DO  2,  M  -  1,  S 

OUT  (M)  -  1*CS+L*CP+12*1*L*M*CD 

2  0UT2  (M)  *  2*(VE-rtC*VCDfL*M*VD+4*L*M*VBT)  /  (3*1*L*M) 

WRITE  (3,4)  OUT 

1  WRITE  (3,5)  0UT2 

4  FORMAT  (*0\  8F  9.0) 

5  FORMAT  ('  \  8F  9.3) 

GO  TO  3 

END 

Just  as  a  nattar  of  lntarast,  this  program  took  lass  than  thraa  saconds 
to  compile  and  run  on  a  WATFOR  compiler  with  a  360/75  conputar. 

Tabls  II  Is  an  example  of  a  typical  conputar  sheet.  Linas,  which  raprasant 
constant  cost  and  constant  variance,  have  baan  drawn  through  tha  tabulations 
to  aid  In  locating  tha  optimum  combination  of  dasign  parameters.  Tha  upper 
element  of  a  pair  is  the  value  of  the  cost  function,  whereas  the  lower  element 
Is  tha  value  of  tha  objective  variance  function.  A  wide  variety  of  designs 
with  almllar  costs  yiald  essentially  tha  same  precision  for  the  deslrad  test. 

For  instance  the  design  with 

a  -  2,  v  ■  4,  r  »  3,  p  -  10,  and  d  ■  1  is  comparable  to  tha  dasign 
a  ■  2,  v  ■  4,  r  ■  3,  p  ■  4,  and  d  ■  5. 

Either  of  these  designs  will  mast  tha  basic  criteria  for  optimisation.  Wa  might 
ask,  however,  whether  a  better  dasign  exists  that  has  essentially  the  same 
variance,  lower  cost  and  more  flexibility.  Tha  design  with 

s  ■  2,  v  ■  4,  r  ■  3,  p  ■  5,  and  d  •  2 

has  a  almllar  variance  and  it  costs  only  75X  of  either  of  tha  previously 
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mentioned  designs.  The  flexibility  nf  of  these  deaIB«»  must  be  considered 

in  making  a  final  selection,  nonetheless,  care  should  be  exercised  to  ensure 
that  the  design  is  not  extremely  sensitive  to  inadequate  a  priori  estimates  of 
the  variance  components  in  the  objective  function.  This  can  be  easily 
accomplished  by  rerunning  the  program  with  a  number  of  alternative  sets  of 
£  priori  estimates. 

NOTE.  Tables  II,  III,  and  IV  use  the  following  a  priori  estimates: 


°S'C2  "  2 


V  "  0>5 

Cost  per  sortie  -  $500 

Cost  per  photo  reading  -  $2 

Fixed  cost  per  photo  Interpreter  ■  $200 

Tables  III  and  IV  illustrate  the  flexibility  that  the  experimental 
planner  can  acquire  by  using  this  programming  method.  Table  III  hat<  the  design 
parameters  s  ■  7,  v  ■  4,  and  r  «  3;  whereas  Table  IV  has  s  *  8,  v  »  4,  and  r  - 
Of  all  the  combinations  in  Table  III  with  variance  of  0.9,  the  design  with  p  • 
and  d  -  1  has  the  smallest  cost  at  $5980.  From  Table  IV,  however,  the  variance 
can  be  maintained  at  0.9  with  A  cost  of  only  $5360  by  using  p  •  2  and  s  ■  8. 

Not  only  can  we  realise  cost  savings,  but  we  can  also  attain  a  alight  reduction 
in  the  variance.  The  design's  flexibility  as  well  as  itB  insensitivity  to 
inadequate  a  priori  estimates  must  necessarily  affect  the  final  choice  from 
the  candidate  designs. 


TABLE  II  -  COST  AND  VARIANCE  OF  DESIGNS  WITH  s  =  2,  v  «  4,  and 


O 
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4200//  5400.  6600.  7800.  9000.  10200.  11400.  12600. 

2.96?  2.933  2-922  2.917  2.913  2.911  2.910  2.908 
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10800.  15600.  20400.  25200.  30000.  34800.  33600.  44400. 

0.742  0.733  0.731  0.729  0.728  0.728  0.727  0.727 


TABLE  III  -  COST  AMD  VARIANCE  OF  DESIGNS  WITH  s  =  7,  v  =  k,  and 
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A  DEFINITIVE  CALIBRATION  OF  AN  AERIAL  CAMERA 
IN  ITS  OPERATING  ENVIRONMENT 

Lawrence  A.  Gambino 

Research  Institute  for  Geodetic  Sciences 
U.S.  Army  Engineer  Topographic  Laboratories 
Fort  Belvoir,  Virginia 

INTRODUCTION .  It  should  be  appreciated  that  the  calibration  of  an 
aerial  camera  in  its  operating  environment  is  more  meaningful  and  effective 
than  a  laboratory  calibration.  However,  even  though  this  principal  has  been 
acknowledged  by  many  scientists  in  this  area  of  endeavor,  the  calibration  of 
aerial  mapping  cameras  has  almost  universally  been  relegated  to  the  laboratory. 
In  recent  years,  the  ballistic  camera  has  been  used  for  recording  flashes  from 
active  earth  bound  satellites  or  recording  reflecting  type  satellites,  such 
as  the  Echo  Satellite,  on  photographic  glass  plates.  The  ballistic  cameras 
are  successfully  calibrated  in  their  operating  environment  using  the  process 
of  stellar  calibration.  This  has  led  to  suggestions  that  the  technique  be 
applied  to  aerial  mapping  cameras.  A  small  amount  of  work  has  been  expended 
in  calibrating  aerial  cameras  using  the  stellar  calibration  technique.  How¬ 
ever,  as  with  the  laboratory  methods,  this  technique  still  suffers  from  its 
failure  to  simulate  the  typical  operational  utilisation  and  environment  of 
an  aerial  mapping  camera1  namely,  photographing  the  ground  thru  a  camera 
window  located  on  the  underside  of  a  fast  moving  aircraft. 

The  experimental  dealgxi  necessary  to  calibrate  an  aerial  camera  in  its 
operating  environment  requires  extensive  knowledge  of  the  scientific  disciplines 
of  analytical,  aerial  photogrammetry ,  optica,  and  first  order  regression 
processes.  It  is  not  the  purpose  of  this  paper  to  explain  in  detail  each  of 
these  scientific  areas,  but  we  will  briefly  discuss  each  of  the  mathematical 
models  necessary  to  carry  out  the  equipment. 

The  photogrammetrlc  model  we  will  adopt  has  been  used  successfully  In 
recent  years  for  analytical,  aerial  triangulation.  Also,  extensive  effort 
has  been  expended  to  develop  a  mathematical  model  which  describes  the 
displacement  of  photographic  Images  due  to  Imperfect  lenses.  Ore  such  model 
is  called  the  Thin  Prism  Model,  and  it  is  used  to  describe  the  radial  and 
tangential  components  of  distortion.  Alternative  models  have  been  derived, 
such  as  Conrady's  Model,  in  the  year  1919.  However,  Conrady's  model  does 
not  agree  exactly  with  the  Thin  Prism  Model.  In  any  case,  there  have  been 
many  investigations  through  the  years  concerned  with  this  aspect  of  optics 
and,  notably,  a  very  recent  investigation  was  carried  out  by  D.  Brown  [1] 
whereby  he  developed  a  model  through  extensive  analytical.,  three  dimensional 
ray  tracing  through  a  thin  prism.  Brown  derived  an  analytical  expression 
defining  the  relationship  between  the  radial  and  tangential  distortion  Induced 
by  a  thin  prism  at  any  specified  azimuth.  This  can  be  considered  as  an 
extension  to  Conrady's  Model. 

A  third  model  which  we  must  adopt  has  been  well  defined  for  many  years 
and  it  describes  the  displacement  of  an  image  symmetrically  about  the 
optical  axis.  It  has  been  found  that  the  distortiou  of  a  perfectly  centered 


The  remainder  of  this  article  haa  been  photographically  reproduced  from  the 
author's  copy. 
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lens  composed  of  flawless  elements  is  symmetric  about  the  optical  axis. 

This  distortion  is  commonly  referred  to  as  symmetric  radial  distortion. 

i 

With  this  all  too  brief  narrative  summary  of  photogrammatry  and  optics, 
we  may  consider  our  final  model  to  consist  of  three  major  components; 
namely,  symmetric  radial  distortion,  decentering  distortion,  and  the  fundamental 
projective  equation,  or  colinearity  equation.  The  colinearity  equation  describes 
the  fact  that,  with  no  distortion,  the  perspective  center  of  the  lens, 
an. Image  on  the  film  and  its  corresponding  point  on  the  ground  (object 
space)  all  lie  on  the  same  straight  line. 

We  will  develop  the  first  order  regression  process  which  makes 
practical  the  solution  of  a  would  be  very  large  system  of  normal  equations. 

The  first  order  regression  process  will  encompass  two  sets  of  parameters 
which  will  be  referred  to  as  stationary  and  .nonstationary  parameters. 

The.  regression  process  which  simultaneously  recovers  these  sets  of  parameters 
is  referred  to  as  Aerial  SMAC,  an  acronym  for  Simultaneous  Multistation 
Analytical  Calibration.  We  will  develop  the  SMAC  process  to  provide  for 
the  Introduction  of  external  or  a  priori  information  associated  with  any 
of  the  stationary  and  nonatatlonary  parameters. 

We  shall  also  discuss  In  brief  the  necessary  requirements  of  a 
photqgrammetrlc  test  range  so  that  the  calibration  experiment  can  yield 
the  best  possible  recovery  of  the  meaningful  parameters  resulting  from  a 
vigorous. data  reduction  process, 

CALIBRATION  RANGE 

•  .  •  * 

In  order  to  carry  out  a  definitive  calibration  of  an  aerial  camera 
in  its  operating  environment,- we  must  conduct  the  experiment  by  flying 
over  special  target  ranges  where  the  horizontal  and  vertical  position  of 
the  targets  are  precisely  known  relative  to  each  other.  A  small  make-shift 
3  by  S  mile  range  is  available  in  the  McClure,'  Ohio,  area.  -  This1  range 
was  used  recently  to  conduct  a  SMAC  experiment.-  As  a  matter  of  fact,  the 
range  was  turned  into  a  night  photogrammetric  test  range  whereby  56,  500 
watt,  iodine  quartz  lamps  were  placed  over  the  .Burvey  markers.  Unfortunately, 
the  final  results  of  this  experiment  are  not  yet  available  at  the  writing 
of  this  paper. 

i 

From  our  model,  we  will  see  that  the  X,  Y,  Z,  position  of  each  of  these  . 
precisely  surveyed  marks  are  taken  as  known  quantities.  Any  small  error 
in  their  position  will  be  smaller  than  the  noise  level  of  the  film 
measuremeAts  at  the  scale  of  the’ photography.  However,  SMAC  suffers  the 
disadvantage  qf  being  inherently  incapable  of  yielding  a  calibration  of 
elements  of  interior  orientation  (focal  length  and  principal  point)  of 
.  the  .cam&ra'.  It  is  well  known  that  the  variations  in  the  elements  of  interior 


orientation  are  projectively  equivalent  to  changes  in  the  Xc,  Yc,  Zc,  'i 

coordinates  of  the  aircraft.  On  the  other  hand,  when  external  information 

is  available,  a  SMAC  reduction  is  possible.  As  ct ated  in  the  introduction, 

the  regression  process  will  be  developed  whereby  external  information  can  '  i 

be  introduced  into  SMAC.  The  necessary  external  information  will  come 

from  either  electronic  tracking  devices,  which  will  track  the  aircraft  as 

it  flies  over  the  tfcot  ran0e,  or  from  ballistic  cameras  observing  a 

flashing  light  on  board  the  aircraft  if  the  range  is  a  night  photogrnmmetric  % 

test  range.  In  either  case,  the  electronic  tracking  devices,  or  ballistic 

cameras,  situated  around  the  test  range,  will  provide  the  Xc,  Yc,  Zc, 

position  of  the  aircraft  from  an  independent  data  reduction  process.  Let  ;• 

it  suffice  to  say  that  with  rigorous  data  reduction  processes,  it  is  possible 

to  recover  the  position  of  the  aircraft  to  within  2  feet,  especially  since 

we  are  considering  excellent  geometry. 

Figure  1  illustrates  the  type  of  permanent  photogrammetrlc  test  range 
to  be  used  in  the  future  and  Figure  2  illustrates  the  flight  patterns 
over  this  range, 

.  In  order  to  provide  the  reader  with  some  idea  of  the  accuracies 
which  we  hope  to  achieve,  we  will  say  that  the  film  measuring  accuracy 
should  be  close  to  5  microns  and  then  the  estimated  elements  of  interior 
orientation  are  expected  to  have  standard  deviations  of  approximately  2 
microns.  The  standard  deviations  of  the  calibrated  functions  of  radial 
and  tangential  distortion  are  also  expected  to  be  approximately  2  microns. 

It  should  be  appreciated  that  these  accuracies  are  achievable  with  only  the 
most  rigorous  data  reduction  process,  precision  measuring  devices,  and  an 
accurately  surveyed  test  range. 


SYMMETRIC  RADIAL  DISTORTION 

As  stated  previously,  the  distortion  of  a  perfectly  centered  lens  is 
symmetric  about  the  optical  axis;  that  is,  the  distortion  is  symmetrical 
about  the  principal  point  and  therefore' is  a  function  of  radial  distance 
only. 

Figure  3  will  give  the  reader  an  idea  of  the  photographic  coordinate 
system  with  which  we  are  dealing.  From,  this  figure,  we  obtain  the  concept 
of  what  is  meant  by  interior  orientation.  The  vector  from  the  perspective 
center  to  the  image  point  is  defined  as  follows: 


~K  -  XC“ 

X  ~  X 

p 

y  -  YC 

m 

y-yp 

Z  -  Zc 

0  -  f 

(1) 
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Figure  1.  Geometry  of  Photography  and  Simultaneous  Tracking  (SHIRAN). 
Illustrates  a  Day  Photogrammetric  Test  Range. 


Figure  2.  Cloverleaf  Flight  Path  Over  the  Calibration  Range. 
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The  quantity  -f  will  be  called  the  principal  distance  and  it  will  be  denoted 
by  the  letter  "c".  Brown  [3]  has  shown  that  the  symmetric  radial  distortion 
function  must  be  one  of  two  forms  depending  upon  whether  or  not  the 
principal  distance  c  Is  carred  as  an  unknown  in  the  calibration  process. 

For  our  purposes  in  the  SMAC  reduction,  we  wish  to  carry  this  parameter  as 
ah  unknown  quantity.  Therefore,  the  distortion  model  we  will  adopt  is  ae 
follows : 


6  -  Kj  r3  +  K2  r5  +  K3  r7  +  ...,  (2) 

where  r  is  the  radial  distance  from  the  principal  point  aid  the  K's  are  the 
coefficients  of  distortion.  We  will  carry  only  three  of  these  coefficients 
in  the  SMAC  reduction. 


I  DECENTERING  DISTORTION 

j  The  distortion  due  to  errors  in  lens  centering  introduces  tangential 

distortion  and  asymmetric  radial  distortion.  It  should  be  appreciated 
|  that  'It  takes  appreciable  skill  and  patience  on  the  part  of  an  optical 

]  technician  in  aligning  the  lens  to  suppress  this  distortion  to  within  the 

five  micron  level.  A  perfectly  centered  lens  means  that  the  centers  of 
curvature  of  all  optical  surfaces  are  collinear,  but  this . goal  is  never 
i  achieved  in  practice.  However,  we  will  use  a  mathematical  model  which  Is 

<  successfully  being  used  in  the  stellar  calibration  of  numerous  ballistic 

.  .  cameras  and  same  aerial  cameras.  As  stated  previously,  the  model  we  will 
adopt  is  that  one  developed  by  ti.  'Brown  [1]  as  an  extended  version  of  - 
1  _■  '  Conrady'e  model.  Brown  scanned  Che  literature  for  topics  concerning  ue- 

.  •  centered  optical  systems  but  found  only  a  few  reference  books  which  touched 
,  upon  this  subject.  '  Most  of  these  books  and  scientific  papers,  published  by 

j  .  various  authors  adopt  the  aforementioned  thin-  prism  model., 
i  ♦  %  •  *  t 

The  thin  prism  model  describes 'the  phenomenon  that  there  exists  on  the 
photographic  glass  plate  an  axis  passing  through  the  principal  point  .along"- 
which  the  tangential  distortion  is  maximum.  At  right  angles  to  the  axis 
of  maximum  tangential  distortion  is  an  sxIb  of  zero  tangential  distortion. 

The  tangential  distortiqn  along  any  other  axis  passing  through  the  principal 
point  is  proportional  to  that  along  the  axis  of  maximum  tangential  distortlop, 
the  constant  of  proportionality  being  the  cosine  of  the  angle- between  the 
axis  in  question  and  the  axis  of  maximum  tangential  distortion  (Brown,  Ref. 
[IP-  '  Analytically.,  the  model  ,'ia  restricted  to  tangential  distortion  while 
Ignoring  the .radial  component  of  decentering  distortion.  Brown  shows  that 
the  behavior  wf  radial  distortion  is  precisely  the  same  as  that  for  tangantial 
'distortion  except  for  a  90s  phase  shift.  Thus,  the  axis  of  maximum  radial 
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distortion  corresponds  to  the  axis  of  zero  tangential  distortion  and. 
vice  versa.  At  phase  angles  of  6  -  0  ■  n.J-  ,  the  radial  and  tangential 
components  are  of  equal  magnitude  for  a  specified  radial  distance 
(Brown,  Ref.  [1]).  The  angle  0  is  the  angle  between  the  positive  x-axis 
and  radius  vector  from  the  origin  to  the  point  whose  coordinates  are  x,  y, 
and  the  angle  0  is  the  angle  between  the  positive  x-axis  and  axis  of 
maximum  tangential  distortion. 

In  order  to  circumvent  the  problem  of  finding  a  suitable 
approximation  for  the  angle  0Q  and  the  other  parameters  in  the  model  for 
decentering  distortion,  Brown  recasts  the  extended  expressions  for 
Conrady's  model  into  the  form 

\  K  lpl  <r2  +  2xZ)  +  2  p2  *y]  [1  +  +  Pi+r4  +  •..]  (3) 

“  [2  xy  +  P2  (r2  +  2  y2)]  [1  +  P3r2  +  P4r4  +  . ,.],(4) 

where 

Pj  ■  -  Jj  sin  0Q 

’  p2  ”  Ji  coa  0o 
■ 

Ps  - 

P4  *  *3 3/ J j 

and  • 

r  »  (x2  +  y2)^  . 

In  our  experiment  we  will  carry  only  three  parameters  of  decentering 
distortion;  namely,  Jj,  and  6p.  This  model  should  hold  for  any 
number  of  decentered  elements  for  short  focal  length  aerial  cameras. 


PROJECTIVE  EQUATIONS 


We  come  finally  to  the  projective  equations  which  relate 
corresponding  vectors  in  Image  space  (aerial  photograph)  with  those  in 
object  space  (terrain).  These  equations  are  equivalent  to  another  set 
of  equation  known  as  the  colinearity  condition  equations  since  they 
describe  the  fact  that  the  object  point,  the  image  point,  and  the 
perspective  center  in  the  lens  lie  on  the  same  straight  line.  These  equations 
are  fundamental  to  many  photogrammetric  problems.  Figure  4  will  enable 
the  reader  to  gain  some  insight  into  the  role  played  by  the  various 
parameters  in  the  colinearity  equations  for  tilted  photographs.  The 
role  of  the  3x3  matrix  [M]  shown  in  Figure  4  is  that  of  an  orthogonal 
transformation  from  the  photographic  reference  system  (image  space)  to 
the  terrain  system  (object  space)  and  vice  versa.  It  represents  three 
sequential  rotations  in  3-space  which  when  multiplied  together  in  the 
proper  order,  yields  the  3x3  orthogonal  matrix  [M] .  The  matrix  [M] 
involves  three  more  parameters  which  must  be  determined  from  our  experiment. 
These  three  angular  parameters  will  be  denoted  as  o,  u,  <  and  are 
-  inherent  in  the  matrix  [M]  as  follows: 


A  B  C  [  -  cos  k  sin  k  0|  ML  0 


[H]  -  A'  B"  C 
D  E  F 


sin  k  cos  k  0  0  -sin  u  cos  u  sin  a  cos  a  0’ 


0  10  cos  to  sin  u>  0  0  1 


cos  a  -sin  a  0 


(-cosacosK-slnKslnusina)  (costcsina-sinxsinwcoea)  (slniccosu) 


(slnKCOBd-cosKsinusina)  (-sinKsina-cosKslnucosa)  (coskcosoi)  <5) 


(cosuslna) 


(coswcosa) 


(slnu>) 


_ If  we  now  put  together  equation's  (1)  and  (5)  we  will  have  our 
colinearlty  aquations  which  relate  the  coordinates  of  image  points  with 
those  of  object  space.  Since  the  original  projective  equations  include  a 
scale  factor,  the  colinearity  equations  eliminate  this  parameter  through 
division  of  the  first  two  matrix. equations  by  the  third  thereby  yielding 
th%  colinearlty  equations 


+  B(Y- 


+  ccz-jr 


D(X-XC)  +  E(Y-YC)  +  F(Z-ZC) 
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(7) 


y  +  C  A'(X-XC)  4-  B'(Y-YC)  +  C'(Z-Zc) 
P  D(X-XC)  +  E(Y-YC)  +  T(Z-ZC) 


As  stated  previously,  we  have 


c  ■  principal  distance  (c  ■  -f) 

x,y  ■  coordinates  of  image  points  (undistorted) 

Xp,y^  ■  coordinates  of  principal  point 

X,Y,Z  ■  coordinates  of  points  in  object  space 

■.XC,yc,zc  ■  coordinates  of  the  perspective  center  of  the  lens 
(aircraft  position) 

A  B-  C  orthogonal  orientation  matrix  defining  the 
rotational  relationship  between  the  x,  y,  z 
[M]  »  k'  B'  C  *'  axes  of  image  space  and  the  X,  Y,  Z  axes  of 

object  space. 

D  E  Fj' 


At  this  point,  we  have  all  the  necessary  models  to  conduct  the 
experiment  for  the  calibration  of  an  aerial  camera  in  its  operating 
.  environment .  ... 


OBSERVATIONAL  EQUATIONS 


If.  we  now  collect  the  various  mo'dels  we  have,  developed  to 
describe  the  undistorted  values  of  .the  observed  quantities,  x,  y,  then 
the  distorted  x,  y  coordinates  corrected  for  symmetric  radial  distortion 
and  decentering  distortion  are  as  follows : 


(x  -  Xp)  “  (1  +  $■)  (x  -  Xp)  + 


(y  -  yp)  ■  (i  +  “)  <7.-  yp>  +  Ay. 
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The  terms  for  radial  distortion  in  equation  (8)  and  (9)  are  arrived 
at  from  the  fact  that  if  only  radial  distortion  exists  in  the  lens, 
then  the  correction  to  the  measured  points  with  respect  to  the 
principal  point  in  the  photo  coordinate  system  is 

fix"  “  —  6  ,  6y'  =  ^  6 
r  r 

where 


X'  =  (x  -  xp)  and  y'  =  (y  -  yp). 


Therefore,  the  undistorted  photo  images  with  respect  to  the  principal 
point  are 


(x-x?)  «■  x'+dx'  “  x^  6  =  x'(lH-|)  »  Or-Xp)  (1+|) 

and 

(y-yp)  -  y'+fiy'  =  y'+£'«  -  y'd+f)  -  C7-yp)  (i+J), 


which  are  the  required  radial  distortion  terms  in  (8)  and  (9). 

The  following  unknowns  are  implicit  in  the  terms  S,  A^  and  A^ : 

.  Kj ,  Kj ,  K3  (coefficients  of  radial  distortion) 

jj,  J2,  0O  (coefficients  and  phase  angle  of  decentering 
distortion) 

Xp,  yp  (implicit  in  6  and  r  where 

r  -  f(x  -  xp)2  +  (y  -  y^2]1/2 

The  substitution  of  equations  (8)  and  (9)  into  the  colinearity  equations 
(6)  and  (7)  introduces  the  coefficients  of  radial  and  decentering 
distortion  into  the  observational  equation.  Therefore,  we  may  express 
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a^,  are  the  rotational  elements  of  exterior 

orientation  and  they  are  explicit  in  the  matrix 
[M]^  for  the  photo, 

x  ,  y^,  c  are  the  elements  of  interior  orientation 
'  (Xp,  yp  are  also  implicit  in  fi-jj  and  r^). 

Finally,  we  see  that  each  pair  of  film  measurements  gives  us  to  two 
independent  equations  involving  fifteen  parameters,  nine  of  which  will 
be  common  to  all  n  photos  and  six  of  which  are  considered  to  be 
changing  from  photo  to  photo.  Thus,  observational  equations  for  all 
n  photos  and  all  m  measured  images  of  precisely  surveyed  control  points 
constitute  a  2mn  system  of  equations  in  6n  +  9  unknowns. 


THE  LINEARIZED  OBSERVATIONAL  EQUATIONS 


The  pair  of  equations,  (10)  and  (11),  may  be  considered  to  be 
of  the  functional  form: 


fl (*ij .yij >xj  »Yj  »zj  »xiiYi»zi»aii^i»K:iiK1,K2,K3,J1 ,J2,0o,xp,yp,c)  -  0 

(13) 

^2  (xij  ,Yij,^'j,Yj,^j,Xi,Yl,^i'ai,wi,Ki'Xl,®“2,^3,l^l,^2,®o’Xp,Yp,C^  "*  ® 

(14)  " 


In  equations  (13)  and  (14), the  measured  film  coordinates  are  subject 
to  random  errors.  At  present,  we  will  treat  the  parameters  as  .. 
completely  unknown  quantities.  Later,  we  will  develop  observation 
equations  generated  by  considering  these  parameters  as  observed  quantities. 
This  means  that  we  will,  develop  a  weight  constraining  procedure  based 
on  how  well  these  parameters  are  known.  Because  equations  (13)  and  (14) 
are  nonlinear,  we  will  linearize  them  using  a  Taylor's  series  expansion 
keeping  only  the  zero, and  first  order  terms.  Therefore,  we  write 


*ij  “  %  +  vlij 


y  ■  y°  +  v 
yij  "ij  2ij 
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t-K 


where  5c?, ,  y?.  denote  the  measured  film  coordinates  for  the  1"" 
*J  lj  • 

measured  image  on  the  lcn  photograph,  and'v... ,  v2  are  the 
corresponding  measuring  residuals.  Also,  we  let  - 


^i  "  “i”  +  ®ai*  wi  “  “1°  +  Ki  “  Ki°  +  (16) 


and 


Xi  yJ  -(yJ)°°+  6yJ,  xj  -(z*)°°+  6Zj,  (17) 


in  which  the  superscript  °°  are  arbitrary  approximations  and  the  6’s 
are  the  unknown  corrections  to  the  approximations.  Further,  we  write 

*1  "  *S°  +  «Ki,  K2  ■  K§°  +  6K2,  K3  -  K§°  +  6Kg, 


Jx  -  Jj°  +  fiJx,  J2  m  +  fiJ2,  0o  0g°  +  680, 


ttbO 


and 


*p  "  *p  +  •  Yp  ■  yj  +  %.  c  -  c°°  +  6c, 


(18) 


where  the  superscript  00  and  6's  have  the  same  meaning  as  before. 
The  substitution  of  (15),  (16),  (17),  (18)  into  (13)  and  (14  >;■  yields- 
the  following  form:  ' 


flij  "  fl(*?j+vlij«  *  a?°+6ai'  •  *  c00+6c)  -  0, 

(19) 

:  '«2lj  -  «r«v  -  v 


As  stated  previously,' we  will,  linearise  these  equations  via  Taylor's 
series  keeping  only  the  zero  and  first  order  terms.  Therefore,  we  get 
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•quafclotui  (13)  and  (14)  at  the  measured  point  using  these  approximations. 
Ve  will  not  dorivr  the  formulas  for  the  partial  derivatives  since  they 
are  straightforvavd  and  they  do  not  add  to  the  purpose  of  this  report. 

.The  linearised  equations  for  the  J**1  measured  image  on  tbs.  i**1 
photo  are  put  in  the  matrix  form 

« +  *ij  ;i  - <**> 


where 


ij 

(2.1) 


~vlid" 

•V 

<!,»> 

^lij 


b2ij  b2ij  *  *  *  2ij 


»  6" 

(9.DI 


BU- 

(2.6) 


•*,  H*  St-  , 

%  %  -  hlii 


>■  4i  " 

(6,1) 


•• 

4(4. 


4z| 


c  , 
•id 
(2,1) 


4Ki 

k2 

,  • 
6c 

*iij" 


•r 


?id 


(25) 


The  riatrlx  aquations  (24)  and  (25)  represent  the  smallest  matrix  units 
in  the  entire  development j  that  is.  they  involve  information  from  only 
the  Jtb  measured  point  on  the  1th  photograph.  Remembering  that 'the 
measurements  on  all  n  photographs  contribute  to  the  solution  of  the  nine 
stationery  parameters  apd  that  all  m  measured  images  on  the  1th  photo 
Contribute  to  the  solution  of  the  six  nonstationary  parameters  per 
photo,  we  can  express  this  linearised, equations  for  all  n  photographs  as 


. . 

v.  +  B.  4  +  B.  4 

.d  d  d 
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itV 


vij 

_i.r 

—  — 

®1J 

vj  B 

■n 

CM 

> 

•  ■ 

®2.1 

• 

B  * 

’  J 

B2  j 

• 

EJ  " 

• 

E2j 

• 

(2n,l) 

• 

<2ci,9) 

• 

(2n,6) 

• 

(2n,l) 

• 

-V”J- 

• 

6 

nj 

• 

•  I 

B 

nj 

4 

e 

nj 

If  in  the  next  step  we  collect  all  equations  generated  by  all 
measured  images,  we  have  the  matrix  equations 


•  • 

V  +  B6  +  B5  ■  e 


(28) 


in  which 


• 

vl~ 

Bi 

Bj  0  ♦  •  *0 

v2 

• 

• 

B2 

•  • 

o  b2*  ,»  *0 

v  - 

1 

B  " 

.  B  - 

(2mn,l) 

• 

* 

(2mn,9) 

• 

« 

(2mn,6n) 

•.  •  •  •  • 
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o 

• 

• 
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- 

V 

• 

£Z 
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It  is  In  the  expanded  matrix  equations  (27)  and  (29)  that  we  see 
the  difference  between  the  analytical  aerial  triangulation  and  camera 
calibration  problems. 

In  order  to  develop  the  rigor  necessary  for  the  complete  calibration 
effort  we  must  consider  the  possibility  of  weighting  the  observed 
quantities  Tf°,  7°. 


WEIGHT  MATRICES 

We  shall  denote  the  covariance  matrix  of  the  film  coordinates 


Aij  ' 
(2,2) 


.  Vy0 
L  ijyij 


0-0  -o 
*ijyij 

a* 

J 


(30) 


end  we  shall  denote  the  weight  matrix  of  ir  ,  y°  to  be 

1J  ij 

"i3  ■  A'iJ  • 

.  (2.2) 


(31) 


Thus  we  allow  the'  film  coordinates  for  a  given  point  to.  be  correlated. 

Let  it  suffice  to  say  that  it  is  possible  to  get  correlation  between 
"S?, ,  y9.  by  conlidering  the  calibration  of  the  Instrument  with  which 
the  film  .measurements  are  made  and  it  Is  also  possible  that  correlation 
may  arise  from  calibrating  cameras  which  do  not- have  flat  fields.  In 
any  case,  by  employing  the  full  covariance  matrix  Ay,  we  properly 
propagate  and  preserve  the  informational,  content  of  toe  original  observations 
throughout  the  camera  calibration  effort. 

We  shall  assume  Independence  of  film  measurements  of  different 
images.  Therefore,  we  m*y  express  the  covariance  and  weight  matrices  for 
the  point  seen  on  all  n  photos  as 
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We  are  now  in  a  position  to  state  that  equations  (28),  (34),  and  (35) 
contain  all  the  information  provided  by  our  original  observation 
equations-  (10)  and  (11).  However,  we  have  not  made  provision  for 
overcoming  a  basic  flaw  in  the  camera  calibration  'experiment;  namely, 
as  stated  previously,  we  cannot  recover  the  elements  of  Interior 
orientation  of  the  camera  (x  ,  y  ,  c)  since  these  elements  are  projectively 
equivalent  to  changes  in  the*coordinates  of  the  exposure  station.  Thus, 
this  is  why  we  have  stated  that  an  external  tracking  system  is  necessary 
for  the  calibration  experiment.  Therefore,  we  must  now  develop  addi¬ 
tional  observation  equations  which  will  allow  weight  constraints  to 
be  applied  to  Xc,  Yc,  Zc  according  to  how  well  the  tracking  system 
triangulates  a  camera  station.  Since  we  must  develop  at  least  three 
additional  observation  equations,  we  will. develop  complete  flexibility 
and  write  observation  equations  for  all  parameters  included  in  the 
adjustment.  This  means  that  we. will  be  able  to  incorporate  into  the 
adjustment  any  a  priori  information  concerning  any,  or  all,  of  the 
parameters  Involved  in  the  calibration  experiment.  The  a  priori  information 
may  come,  for  example,  from  a  previous  calibration. 


OBSERVATION  EQUATIONS  GENERATED  BY  ELEMENTS  OF  ORIENTATION.  RADIAL  AND 
DECENTERING  DISTORTION 


In  order  to  develop  the  flexibility  of  constraining  the  unknown 
parameters  to  within  prescribed  limits  by  weighting,-  we  .must  develop 
observation  equations  for  all  parameters  Involved  in  the'  calibration 
problem.  We  shell  assume  that  Independent  observations  are  available 
for  ell  parameters.  Thus,  using  previous  notations  for  observed 
quantities,  we  write  for  the  elements 'of  interior -orientation,  .radial 
and  decentarlng  distortion  .parameters 


-’S+.v* 

yp 

■ r*  *  V 

c  ■  c°  +  V 

C 

p 

.  p 

’ 

■*°  + v 

k2 

■  +  V 

K3  -  K3  +  v, 

■  J?  +  » 

J2 

-JS  +  v. 

0  <•  0°  +  v 

0  .0  i 

■ 

wl\ere  the  v's  are  observational  residuals.  If  we  eliminate  the  adjusted 
observations  from  equations  - (18)  and  (36),.  we  arrive  at  the  observation 
aquations 
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Matrix  equation  (39) ,  which  involves  observations  on  the  stationary 
parameters,  is  expanded  as  follows: 


If  we  assume. that  observations ' on  the  stationary  and  nonstationary 
parameters  are  independent  of  each  other,  the  covariance  and  weight 
matrices  associated  with  the  observational  vectors  (39)  and  (41)  are 


29Q 


for  the  stationary' parameters  ami 


for  the  nonstationary  parameters.  In  (45) we  let  denote  the 
covariance  matrix  of  the  observations  of  the  elements  of  exterior 
orientation  for  the  i1*1  photograph  and  let  W.  *  AJ1 .  It  is  not 
necessary  for  these  covariance  and  weight  matrices  to  be  diagonal. 

They  can  be  completely  filled  without  creating  undue  strain  on  the 
computations. 

At  this  stage 3  we  have  three  matrix  observation  equations  arising 
from  .  . 

1.  Measured  film  coordinates, 

2.  Stationary  parameters, 

3.  Nonstationary  parameters. 

We  are  now  in  a  position  to  form  normal  equations.  j 


NORMAL  EQUATIONS 

Writing  the  matrix  observation  equations  as  follows: 
v  +  +  B6  •  e 

0-6  -  e  •  '  (46) 

.v  -  6  ■  e 


we  can.  merge  these  matrix  observations  equations  into  the  single  matrix 
equation 
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Brown  f 4 1  shown  i-hut-  rtn  £C.i 
leads  to  the  minimization  of 


tier,  of  the  Vetioid  v  and  6  simultaneously 
the  quadratic  form  of  the  residuals 


s 


W  v  . 


(5.1) 


”  *?  "!“  -■Ml*  equations  (47)  and  (50)  and  realize  that  the  structure 
ZZ  as  *  C°1U“  ”itrlX  °f  i 
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0  W  0 
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which,  after  multiplication,  becomes 


n  .  .  ' 

(  l  N  )+  W' 
i-1  1  l 

N 

—  — 

• 

6 

n  .  •  •i 

(2  c)  -  w  e 
1-1 

N*  1 

U.  | 

N  +  W 

8 

*•  ••  *. 

•  c  -We 

(52) 


(53) 


The  individual  matrix  components  of 
dimensions  are  .  . 


the  normal  equations  (53) 


and  their 


(  Z  Ni  )  -  ftT  W  ft  , 

i*l  (9,9)  (9,2mn)  (2mn,2mn)  (2mn,9) 

(  T-  *i  )  -  &1  W  e 

i"l.(9,l)  (9,2mn)  (2mn,2mn)  (2mn,l) 
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(54) 


N  =  Br  W  B  , 

(6n,6n)  (6n,2nra)  (2imi,2mu)  (2mn,l) 

c  ■  W  e  , 

(6n,l)  (6n,2mn)  (2mn,2mn)  (2mn,l) 

—  •  T 

N  -  B  W  B  , 

(9,6n)  (9,2mn)  (2mn,2mn)  (2mn,6n) 

•  • 

where,  as  stated  previously,  the  N  and  c  portions  of  the  normal 
equations  are  quickly  and  simply  formed  by  virture  of  the  structure 
of  the  6  matrix.  The  general  normal  equations  for  the  simultaneous 
adjustment  of  all  n  photographs  are  diagrammaticaily  given  ns  follows: 

< - >9  +  gn  cols<- - > 


f~9  cols— >1  < - 6n  cols-  - )  Constant  cols 


(55) 


This  is  the  first  ordar  regression  scheme  referred  to  earlier  in  the 
report.  This  type  of  structured  normal  equations  were  successfully 
solved  by  Brown  [5].  Practically,  a  solution  is  possible  no  matter  how 
large  n  may  ‘be  and  it  is  found  that  the  computations  increase  only 
linearly  with  n.  In  our  camera  calibration  experiment,  the  largest 
matrix  to  be  handled  is  of  order  9.  It. is  not  out  purpose  to  give  full 
‘details  of  the  computation  algorithm  since  these  details  are  available: 
•j. i  •  in  many  of  Brown's  reports, -for  example,  reference  15].  Let  it  suffice 
to  say  that  in  practice  we'  will  handle  approximately  20  photographs 
selected  from  four  passes  over  a  test  range,  and  that  we  hope  to  obtain 
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at  least  40  images  on  each  photo.  These  images  will  be  common  to  all  20 
photographs.  Therefore,  we  will  have  800  measured  images  contributing 
directly  to  the  calibration  of  the  stationary  parameters  whereas 
40  measured  images  will  contribute  to  the  determination  of  each  set  of 
nonstationary  parameters.  The  resulting  normal  equation  system  will  be 
of  order  129  generated  from  1600  observation  equations,  if  only  the 
equations  from  the  linearization  are  considered. 


CONCLUSIONS 


We  consider  SMAC  to  be  a  total  system  calibration  since  the 
calibrated  distortion  parameters  will  include  effects  of  the  camera 
window  and  the  shockwave  of  the  airstream  upon  the  window.  Also, 
since  the  systemmatic  errors  tend  to  be  independent  from  one  frame  to 
the  next,  the  estimates  of  the  stationary  parameters  will  not  be  unduly 
influenced  by  these  type  errors.  We  feel  that  this  represents  the 
actual  conditions  in  practice  and  therefore,  SMAC  is  a  significantly 
superior  process  as  compared  to  one  which  might  use  single  frames  each 
employing  10  times  as  many  measured  images  as  on  a  SMAC  frame. 

A  by-product  of  the  SMAC  reduction  is  the  covariance  matrix  of 
the  adjusted  parameters.  This  permits  evaluation  of  error  bounds  of  the 
calibrated  functions  of  radial  and  decentering  distortion  along  with'  those 
‘of  the  parameters  of  exterior  orientation. 

A  quantitative  calibration  of  existing  decentering  distortion  in 
a  lens  is  especially  important  since  the  practical  limit  of  the  length 
of  an  analytical,  aerial  control  extension  project  is  heavily  dependent 
upon  the  elimination  of  the  effects  of  decentering  distortion. 

In  conclusion,  then,  we  feel  that  a  SMAC  approach  to  the  problem 
of  calibrating  aerial  cameras  in  their  operating  environment  is  both 
feasible  and  practical,  and  that  its  potential  value  in  other  modes  of 
operation,  such  as  stellar  calibrations  of  aerial  and  ballistic  cameras, 
is  yet  to  be  realized.  We  hope  to  have  the  results  of  a  SMAC 
calibration  in  the  near  future;  that  is,  both  a  stellar  SMAC  and  an  aerial 
SMAC  of  at  least  two  aerial  mapping’ cameras  so  that  wc  will  have  a 
comparison  of  three  types  ,of  calibration  (laboratory,  stellar,  ‘aerial) 
of  the  same ‘ cameras .  These  comparisons  should  prove  the  effectiveness 
.  of  the  aerial  SMAC  approach.  ; 
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DESIGN  AND  ANALYSIS  OF  A  STATISTICAL  EXPERIMENT 
ON  HIGH-VOLTAGE  BREAKDOWN  IN  VACUUM* 


M.  M.  Chrepta,  0.  V.  Taylor,  and  M.  H.  Zinn 

Electron  Tubes  Division 
Electronic  Components  Laboratory 
US  Army  Electronics  Conn  and,  Fort  Monmouth,  N.  J. 


The  problem  of  high-voltage  breakdown  in  vacuum  has  been  studied  for 
mare  than  forty  years.  From  these  studies  many  conflicting  theories  have 
evolved  that  still  do  not  reliably  define  a  breakdown  criterion  nor  explain 
the  mechanisms  Involved  in  the  process.  High-voltage  breakdown  In  vacuum 
has  received  renewed  interest  in  recent  years  because  of  the  demands  for 
superpower  radar  system  components,  ion  thrusters  for  space  propulsion,  and 
high-energy  particle  accelerators. 

The  study  of  the  factors  that  effect  a  high-voltage  breakdown  in  vacuum 
Is  being  performed  at  this  laboratory  using  statistically  designed  experi¬ 
ments.  Initially,  the  sixteen  factors  shown  in  Table  I  were  defined  as 
probable  contributors  to  the  breakdown  proceeat 

TABLE  I  -  FACTORS  EFFECTING  BREAKDOWN 


Inflexible  Factors 


Fltxlble  Factors 


1.  Cathode  Material 


12.  Residual  Gas  Pressure 


2.  Anode  Material  13.  Energy  of  Supply 

3.  Cathode  Finish  14.  Contaminant 

4.  Anode  Finish  15*  Magnetic  Field 

5.  Cathode  Geometry  16.  Electrode  Spacing 

6.  Anode  Geometry 

7.  Vehicle  Bakeout 

8.  Envelope  Material 

The  objective  of  this  program  is  to  analyze  the  significance  of  each  of 
these  factors  as  veil  as  their  interactions . 

The  first  designed  experiment  was  carried  out  using  seven  of  the  in¬ 
flexible  factors,  each  at  two  levels,  in  a  27~a  plan  (Table  II )  derived 
f**om  Table  M  of  Davies' 


"Sponsored  by  Advanced  Research  Projects  Agency  under  UB  Army  Electronics 
Conmand  Contract  DA28-043  AMC-00394(e)  ARPA  Order  No.  517  PROJECT  DEFENDER 
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TABLE  II  -  FACTOR  LEVELS  FOR  INVESTIGATION 

T1-7AL-4M0 


CATHODE  MATERIAL 

ANODE  MATERIAL 

CATHODE  FINISH 

ANODE  FINISH 

CATHODE  GEOMETRY 

ANODE  GEOMETRY 

VEHICLE  BAKEOUT 


<  304-SS 
OFHC  Cu 

Tl-7AL:iMa 

304- SS 
OFHC  Cu 

COARSE _ 

FINE 
COARSE 

<CflN£ _ _ 

SPHERE 

PLANE 
3PHfg5_- 

PLANE 


PRESENT 


298 


Table  III  shows  the  levels  of  each  factor  for  each  of  the  thirty-two 
treatments.  The  minus  sign  in  each  treatment  means  that  the  factor  Is  either 
at  the  low  level  or  absent  from  the  treatment;  +>,»  that  the 

factor  Is  at  the  high  level  ov  present  In  the  treatment.  The  set  Is  ortho¬ 
gonal;  each  level  of  any  factor  la  tested  equally  against  each  of  the  other 
factor  level  combinations; 


TABLE  III  -  27-2  PUUI 


TREATMENT 

□1 

□1 

□ 

Bl 

Bl 

B 

B 

mmmnmm 

mi 

Bl 

B 

Bl 

Bl 

B 

B 

ai 

Ell 

B 

- 

- 

B 

B 

be.  1 

bi 

ai 

B 

- 

Bl 

a 

9 

bbkhbbh 

ai 

ai 

- 

- 

HI 

9 

D 

d  f 

- 

- 

- 

+ 

Bl 

a 

B 

sed 

+ 

- 

+ 

+ 

B 

9 

bed 

bi 

bi 

Bi 

a 

Bl 

B 

B 

ab  df 

El 

ai 

- 

Bl 

Bl 

a 

-J 

ct 

- 

Bl 

□ 

13 

B 

B 

B 

oaf 

+ 

a 

B 

9' 

B 

a 

9 

b«< 

- 

+ 

- 

B 

El 

a 

B 

abet 

a 

El 

El 

- 

□ 

- 

Ida. 

a 

a 

El 

El 

+ 

+ 

B 

ads 

+ 

- 

- 

El 

+ 

B 

9 

bdt 

- 

+ 

- 

El 

+ 

a 

a 

abedt. 

El 

El 

El 

Dj 

B 

a 

B 

n 

a 

a 

a 

a 

a 

sag 

El 

- 

El 

B 

9 

a 

bat 

- 

El 

n 

ia 

B 

a 

obf  9 

El 

El 

IB 

B 

- 

a 

a 

dg 

- 

- 

+ 

- 

- 

a 

aed.g 

o 

IB 

m 

ID 

B 

a 

B 

bed. g 

a 

a 

m 

ID 

B 

a 

a 

abdg 

\EM 

a 

IB 

ID 

B 

- 

caf  9 

- 

IB 

IB 

IB 

9 

a 

a 

oo  a 

+ 

IB 

- 

- 

a 

BS 

a 

bag 

- 

IB 

IB 

- 

B 

D 

±1 

abea.g 

ia 

ID 

ID 

IB 

a 

a 

o 

edag 

T- 

+ 

ID 

El 

B 

+ 

adafg 

a 

IB 

- 

•f 

+ 

± 

bd»*g 

B 

D 

ia 

ID 

4- 

♦ 

a 

abedag 

ID 

ID 

ID 

ID 

■f 

- 

u 

The  letter  assignments,  shown  in  Table  IV,  were  carefully  chosen  so 
that  in  the  treatment  and  analysis  of  the  results  the  effect  of  any  two- 
factor  interaction  Involving  the  bakeout  factor,  D,  would  be  clear  of  any 
other  main  effect  or  two- factor  interaction  of  interest: 
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TABLE  IV  -  LETTER  ASSIGNMENT 
A  -  Anode  Material 
B  -  Cathode  Shape 
C  -  Cathode  Material 
D  -  Bakeout 
E  -  Anode  Shape 
F  -  Anode  Finish 
0  -  Cathode  Finish 

The  treatments  were  randomized  and  performed  ini  the  test  vehicle  shown 
in  Fig.  is 
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Each  treatment  was  carried  out  In  thla  manner:  The  voltage  vas  Increased 
In  10- kV  steps,  each  step  held  for  two  minutes.  During  this  procedure, 
the  microdischarges  (self-quenching  pulses  of  current),  hydrogen  evolutions, 
X-radiation,  visible  radiation,  sad  prebreakdovn  current  were  Tonitored . 

The  voltage  vas  increased  in  this  stepwise  manner  until  puffs  of  hydrogen 
were  detected  by  the  mass  spectrometer.  This  voltage  vas  recorded.  After 
the  gap  vas  outgassed  again,  the  Increase  of  voltage  waa  continued  until 
sparking  occurred.  This  voltage  was  recorded  as  the  first  breakdown  vol¬ 
tage.  This  procedure  was  repeated  for  each  treatment  at  six  electrode 
separations  from  0.5  to  3.0  cm.  During  the  application  of  voltages  at  each 
gap  setting,  the  sparking  and  gas  evolution  conditions  the  electrodes  so 
that  higher  voltages  may  be  held  off.  These  higher  voltages  were  also  re¬ 
corded  for  the  analysis. 

Thus,  we  have  three  sets  of  yieldB  of  voltages  that  can  be  incorporated 
as  the  inputs  to  the  design  plan  for  analysis.  These  numbers  inserted  in 
the  boxes  of  the  design  table  and  treated  with  the  signs  shown  will  give 
the  deviation  from  the,  average  of  the  whole  experiment  for  each  factor  and 
factor  interaction.  The  results  can  be  obtained  in  a  more  systematic  man¬ 
ner  by  using  the  Yates  Algorithm,  which  consists  of  repeatedly  adding  and 
subtracting  adjacent  test  results  until  the  results  for  the  mean,  main 
effects  and  two-factor  interactions  are  obtained,  as  shown  in  Table  V! 


TABLE  V  -  DEFINING  RELATION 


I  -  - ABDFG  •  - CDEFG  »  ABCE 
YIELDS  OF  YATES  ALGORITHM 


1 

mean 

12 

ABE  ♦  © 

23 

BDG - AF 

2 

® 

13 

m 

24 

ABDG  -  ® 

3 

14 

AOE 

25 

EG 

4 

AB  *  CE 

19 

BDE 

26 

AEG 

5 

16 

ABOE  ♦  CO 

27 

BEG 

6 

17 

® 

28 

A  BEC  +  CG 

7 

m 

18 

AG 

29 

DEG-  CF 

8 

ADD  -  FG 

19 

BG 

30 

ADEG 

9 

© 

20 

ABG  -IdF] 

31 

BO  EG 

10 

AE  4  BC 

21 

m 

32 

ABDEG  ~ EF 

1  1 

BE  4  AC 

22 

ADG  -  BF 
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Thin  shnva  that  we  can  tret  seven  main  effects  and  six  two-factor 

interactions  with  D  (the  takeout)  plus  the  mean.  The  others  may  he  used 
for  estimating  error. 

The  analysis  was  carried  out  using  the  Yates  Algorithm  with  inputs  of 
the  voltages  obtained.  The  results  of  this  analysis  indicated  a  low  level 
of  confidence  for  the  effects.  Therefore,  the  voltages  were  plotted  versus 
distance  to  the  one-half  power,  since  these  and  many  other  experimental  re¬ 
sults  have  been  found  to  follow  this  relationship.  From  these  plots  a  slope 
was  calculated  and  used  as  Inputs  to  the  Yates  program.  This  slope,  using 
the  average  of  many  points,  smoothed  out  the  values  as  well  as  the  error 
and  gave  more  significant  results. 

These  results  are  plotted  on  half- normal  graph  paper  as  shovn  in  Fig.  2: 


Fig.  2  Half-normal  plot  of  coefficients  obtained 
from  the  Yates  Algorithm. 


This  graph  is  designed  to  give  a  straight  line  for  any  random  process. 

The  order  number  represents  the  range  of  values,  from  smallest  to  largest , 
corresponding  to  the  coefficients  obtained  from  the  Yates  analysis.  Devi¬ 
ations  from  a  straight  line  indicate  that  the  factor  has  a  significant 
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influence  on  the  distribution  of  the  thirty-two  values  and  that  the  values 
are  other  than  random.  This  plot  shows  that  a  straight  line  can  bs  drawn 
through  most  of  the  points  that  represent  effects  of  the  factors  with  little 
or  no  deviation  from  the  average  of  the  experiment.  The  pointB  labeled  A, 

E,  B,  D,  and  AE+BC  are  real  effects ,  and  the  significance  1b  indicated  by 
the  distance  from  the  straight  line.  The  AE+BC  effect,  however,  does  not 
donate  any  information  because  the  AE  cannot  be  distinguished  from  the  BC 
effect.  From  this  experiment  these  conclusions  can  immediately  be  drawn: 

1.  A,  E  (the  anode  material  and  geometry)  are  most  important. 

2.  B  (the  cathode  geometry)  is  important. 

3 •  D  (the  bakeout  factor)  is  important,  but  less  than  the  above. 

The  level  of  the  Mode  geometry  factor  that  raised  the  breakdown  vol¬ 
tage  is  the  spherical  electrode.  This  might  also  be  said  for  the  cathode 
geometry,  but  with  less  confidence.  When  the  anode  material  was  titanium 
alloy,  higher  breakdown  voltages  were  reached,  than  when  it  was  copper. 

The  bakeout  factor,  D,  was  pertinent  to  this  experiment  with  the  test  ve¬ 
hicle  designed  for  this  study.  The  two  levels  of  bakeout  were  complete 
system  and  electrode  bakeout  versus  electrode  only  bakeout.  The  electrodes 
were  equipped  with  internal  heaters  for  thin  purpose.  The  complete  system 
and  electrode  bakeout  level  is  superior  to  electrode  only  bakeout  for  at¬ 
taining  higher  breakdown  voltages. 

Along  with  the  statii  tical  analysis,  the  results  of  the  experiment 
were  analyzed  as  to  the  physical  processes  occurring  in  the  highly  stressed 
electrode  system.  As  previously  stated,  the  hydrogen  partial  pressures 
were  monitored  on  the  mass  spectrometer.  Large  bursts  of  gas  were  coinci¬ 
dent  with  sparking  or  breakdown.  AIbo,  the  superiority  of  spherical  elec¬ 
trodes  In  holding  off  higher  voltages  suggested  a  breakdown  mechanism  de¬ 
pendent  on  the  amount  of  gas  present  in  the  gap  and  the  pumping  conductance 
of  the  electrode  gap  system  caused  by  the  shape  and  size  of  the  eloctrodes 
and  the  gap  distance.  A  theory  was  proposed  whereby  the  gaB  conductance 
of  the  gap  played  a  major  part  in  the  breakdown  process.1  Simply  stated, 
small-urea  electrodes  with  a  high-conductance  gap  will  hold  off  higher 
voltages  than  large-area  electrodes  at  the  same  gap  spacing.  To  evaluate 
this  theory,  a  second  statistically  designed  block-of-eight  experiment  was 
derived.  The  objective  of  this  experiment  was  to  verify  the  gas  pumping 
conductance  theory.  The  factors  chosen  were  anode  processing,  cathode 
proceiiBing,  and  electrode  size.  The  two  levels  of  electrode  processing 
are  hydrogen  baked  versus  vacuum  baked,  and, for  size,  a  4"  versus  4/3" 
diameter  Bruce  plane,  as  shown  in  a.  of  Tabic  VI.  Because  of  the  simplicity 
of  this  full  factorial.  2s  experiment,  it  was  decided  to  incorporate  a  trans¬ 
verse  magnetic  field  as  a  factor  at  the  end  of  each  treatment,  as  shown  in 
b.  of  Table  VI.  The  treatment  was  repeated  with  magnetic  field  and  then 
again  without  magnetic  field  to  show  up  any  consistent  difference  between 
the  first  and  third  breakdown  voltages  because  of  the  application  of  the 
magnetic  field; 

1.  m.  J.  Mulcahy,  A.  Watson,  and  W.  R.  Bell,  'High  Voltage  Breakdown  Study," 
USASCOM  Contract  DA28-043  AMC-00394(e)  ARPA  Order  No.  517  (1967). 
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TABLE  VI  -  FACTORS  AID  LEVELS  FOR  BLOCK-OF-SMHT  EXPERIMENT 


Factor 


Letter 


Level 


An^de  Processing 

A 

High 

Low 

Cathode  Processing 

B 

Electrode  Size 

C 

a  -  Vacuum  Baked 

1  -  Hydrogen  Baked 

b  -  Vacuum  Baked 

1  -  Hydrogen  Baked 

c  -  Large 

1  -  Small 

b.  With  * 

tagnetic  Field 

Anode  Processing 

A 

a  -  Vacuum  Baked 

1  -  Hydrogen  Baked 

Cathode  Processing 

B 

b  -  Vacuum  Baked 

1  -  Hydrogen  Baked 

Electrode  Size 
Perpendicular 

C 

c  -  Large 

1  -  Small 

Magnetic  Field 

D 

d  -  Present 

1  -  Absent 

This  is  now  a  complete  2*  factorial  experiment  and  can  he  analyzed  separately 
as  tvo  2s  experiments,  as  shown  in  Table  VII: 


TABLE  VII  -  EXPERIMENTAL  ORDER 

Main 


Perpendicular 


v/ruer 

1 

Anode  4- inch  Bruce  h-bakcd 

Cathode  ^ineh  Bruce  h-baked 

P-LOCK 

c 

nexq 

cd 

2 

Anode  4/3- inch  Bruce  h-baked 
Cathode  4/3- inch  Bruce  h-baked 

(1) 

d 

3 

Anode  4-inch  Bruce  vac-baked 
Cathode  4-lnch  Bruce  h-baked 

ac 

acd 

4 

Anode  4/3- inch  Bruce  vac-baked 
Cathode  4/3-inch  Bruce  vac-baked 

ab 

abd 

5 

Anode  4/3-lnch  Bruce  h-baked 
Cathode  4/3-inch  Bruce  vac-baked 

b 

bd 

6 

Anode  4- inch  Bruce  h-baked 
Cathode  4-lnch  Bruce  vac-baked 

be 

bed 

7 

Anode  4- inch  Bruce  vac-baked 
Cathode  4-lnch  Bruce  vac-baked 

abc 

abed 

8 

Anode  4/3-lnch  Bruce  vac-baked 
Cathode  4/3-lnch  Bruce  h-baked 

a 

ad 
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The  *m*  perforated  similarly  to  the  stepwise  voltage  increase 

procedure  aa  described  before.  The  resulting  voltages  are  plotted  versus 
distance  to  the  one-half  power.  In  Fig.  3,  first  the  average  effect ,  \l, 
without  magnetic  field  present,  la  plotted  with  the  average  effect,  u< t  with 
magnetic  field  present.  It  can  be  seen  immediately  that  the  magnetic  field 
lowers  the  breakdown  voltage  except  at  the  smallest  spacing  tested: 


Fig.  3  Breakdown  voltage  versus  gap  separation 
in  centimeters  to  the  one-half  power  for 
average  values  with  Bind  without  magnetic 
field. 


In  Fig.  4,  the  effects  of  the  factors  Aj ,AE,  and  A,  are  shown  by  sub¬ 
tracting  the  values  Individually  from  the  corresponding  overall  average 
breakdown  value,  u<  The  subscript  1  refers  to  the  conditioned  breakdown 
value  prior  to  applying  magnetic  field  and  3  refers  to  the  breakdown  value 
after  application  of  magnetic  field.  The  differences  in  these  values  are 
Indicative  of  a  memory  of  the  conditions  Imposed  by  the  magnetic  field 
after  it  was  removed.  Other  main  effects  and  two- factor  interactions  axe 
plotted  similarly: 
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Fig.  k  Breakdown  voltage  versus  gap  separation 
in  eentiaeters  to  the  one-half  power  for 
average  values  with  factor  A  and  two- 
factor  interaction  AE. 


From  these  curves,  the  principal  conclusions  that  can  he  stated  with 
a  good  measure  of  confidence  are  as  follows: 

1.  The  hydrogen-baking  procedure  permitted  higher 
breakdown  voltages  than  did  the  vacuum-baking. 

The  magnetic  field  amplified  this  difference. 

2.  Large-area  electrodes  reduced  the  breakdown  voltage, 
which  is  consistent  with  the  results  obtained  in 
the  first  experiment.  The  magnetic  field  had  no 
effect  in  this  case. 

3.  The  combined  effect  of  hydrogen-baking  of  the 
cathode  and  using  snail  electrodes  raises  the 
breakdown  voltage.  This  effect  is  amplified  in 
the  presence  of  a  magnetic  field. 

The  results  of  these  experiments,  presented  In  this  manner,  show  with 
a  good  degree  of  confidence  what  can  he  expected  when  electrodes  sre  de¬ 
signed  for  high-voltage  devices.  These  data  ere  for  copper  electrodes. 
Other  materials  of  interest  to  vacuum  component  design  engineers  will  he 
similarly  analyzed. 

The  next  experiment  (now  being  conducted)  was  designed  as  a  full  fac¬ 
torial  with  six  factors  at  two  levels,  as  a  result  of  soas  three-factor 


interactions  shoving  up  in  the  analysis  of  the  block-of-eight  experiment. 

■T*H4  fl  4  a  flnne  4  m  am  +  n  Va  hammI  a4- a  nw.4  aasaan  »1  1  4*k  a  4  iiamaam  4m  fViA 

breakdown  process. 

Different  materials,  as  veil  as  the  other  factors  initially  named, 
will  be  introduc sd  into  each  successive  experiment.  The  results  of  this 
program  will  be  compiled  in  the  form  of  graphs  and  charts  for  the  high- 
voltage  design  engineer. 
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IMPROVING  BINOMIAL  RELIABILITY  ESTIMATES  - 
A  MODERATELY  DISTRIBUTION  FREE  TECHNIQUE  FOR 
SMALL  SAMPLE  RELIABILITY  ESTIMATION 


Michael  G.  Billings 
C-E-I-R,  Inc. 

Dugway  Proving  Ground,  Dugway,  Utah 


1.  INTRODUCTION.  The  purpose  of  this  article  is  to  demonstrate  how  lower 
confidence  bounds  for  reliability  obtained  using  the  distribution  free 
binomial  approach  can  be  improved  under  fairly  nonrestrictive  assumptions 
on  the  random  variable  involved.  The  technique  to  be  described,  referred 
to  hereafter  as  the  MDF  technique,  is  an  extension  of  the  result  presented 
at  the  1966  Army  Design  Conference  (see  [1]). 

2.  THE  MDF  TECHNIQUE.  Suppose  that  the  random  variable  under  consideration 
is  continuous  (i.e,  has  an  absolutely  continuous  distribution  function)  and 
nonnegative  with  distribution  function  F(x)  and  density  function  F'(x). 
Suppose  further  that  the  mission  for  which  the  reliability  is  to  be  estimated 
can  be  expressed  as  a  number  T  in  the  domain  of  F(x) ,  and  suppose  that  the 
reliability  is  to  be  estimated  on  the  baBis  of  a  sample  of  n  independent 
systems  from  the  population  under  investigation.  The  following  Proposition 
provides  the  basis  for  the  MDF  estimation  technique. 

Proposition  1.  Let  Y  bo.  the  number  of  mission  failures  in  n  trials. 

For  ye (0,1)  let  C(Y)  be  the  solution  to  the  equation 


l  (f)  [C(Y)]f[l  -  C(Y))n"f  -  1  -  y. 
f*0 

Let  M(Y)  ■  -  where  is  the  (Y+l)t'1  order  statistic  and  T  is  the 

mission.  Finally,  let  k(y,n)  be  determined  by  the  equation 

n* 

l  (f)[k(Y,n)C(0)]f[l  -  k(y,n)C(f)]n_f  -  1  -  y, 
f-0 

where  n*  n,  1  -  k(y,r.)C(n*)  >  0  and  1  -  k(y,n)C(n*  +  1)  <_  0,  If  F'(x) 
is  monotone  nondecreasing  on  t*ien 

PrU  -  k(Y,n)(§J=k  <  1  -  F(T)>  >  y. 


The  proof  of  Proposition  1  is  lengthy  and  is  included  in  the  Appendix. 


The  estimator  1  -  k(y,n)[- 


]  will  be  called  the  MDF  y-confidence  lower- 


bound  estimator  for  the  reliability  1  -  F(T). 
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It  is  seen  that  for  each  Y  the  number  1  -  C(Y)  is  simply  the  binomial 
y-confidence  lower  bound  estimator  for  the  reliability  based  on  Y  failures 
in  n  trials.  Values  of  this  estimator  are  tabulated  for  selected  values 
of  n  and  y[2] .  Precise  determination  of  the  number  k(y,n)  for  each  pair 
(y,n)  is  most  easily  accomplished  on  a  computer.  Table  1  presents  values  of 
k(y,n)  for  three  confidence  levels  (.90,  .95  and  .99)  for  selected  values  of 
n  from  5  to  100.  Values  of  k(y,n)  intermediate  to  values  of  n  given  can  be 
obtained  quite  accurately  by  linear  interpolation. 


In  an  application  of  the  MDF  technique  to  a  specific  problem,  a  confidence 
level  y  is  chosen  first;  from  the  observed  data  one  then  determines  the  value 


of  M(Y) 


where  Y  is  the  observed  number  of  failures  of  the  mission  X 


and  X(y+i)  the  (Y+l)Ctl  order  statistic  for  the  sample.  The  value  C(Y)  is 

obtained  from  binomial  reliability  tables  (1  -  C(Y)  is  the  lower  y-confidence 
bound  estimate  for  the  mission  reliablity  based  on  Y  failures).  Finally,  the 
value  of  k(y,n)  is  obtained  from  Table  1.  Thus,  for  example,  if  Y  -  f,  then 


according  to  Proposition  1,  1  -  k(y,n)*  )  is  the  MDF  lower  y -confidence 

estimate  of  the  reliability  1  -  F(T)  for  the  mission  T. 


Example  1.  Suppose  that  for  a  given  reliability  estimation  problem  the 
mission  is  T  ■  3.2  hours,  and  the  times  to  failure  for  a  sample  of  15  systems 
are  given  by  {11.9,  5.8,  8.1,  13.2,  12.7,  12.6,  25.6,  20.2,  9.2,  20.6,  14.2, 
17.8,  19.8,  28.1,  12.2}.  Let  the  confidence  level  be  y  »  .90,  and  suppose 
ic  can  be  assumed  (see  Example  3)  that  the  probability  density  function  for 
the  tima  to  failure  random  variable  X  Is  monotone  nondecreasing  on  (0>X(y+1)^’ 

where  Y  is  the  number  of  mission  failures  (i.e.  X^+i)  "  X(i)  "  5.8).  In 

accordance  with  the  above  description,  the  MDF  lower  .90  confidence  bound  for 
the  mission  reliability  is  obtained  as  follows;  From  the  sample  data,  M(Y)  » 
M(0)  ■  X^/3.2  ■  1.81.  From  binomial  reliability  tables  C(Y)  «  C(0)  ■  .142. 

By  an  interpolation  in  Table  1,  k(.90,  15)  is  determined  to  be  1.17132. 
Thus,  according  to  Proposition  1,  the  MDF  .90  confidence  lower  bound  estimate 
for  1  -  F(T),  the  mission  reliability,  is 

1  -  1. 17132  (~||)  "  .9081. 

(The  corresponding  binomial  estimate  is  .8577.) 

Example  2.  Suppose  circumstances  are  tho  same  as  for  Example  1,  except 
that  the  dc'ta  are  as  follows:  {0.9,  4.1,  4.6,  4,7,  7.1,  7.5,  7.9,  11.1,  11.1, 
11.5,  15.9,  17.5,  18.1,  21.9,  22.3}.  Then 

X 

Y  -  1,  M(Y)  “  M(l)  -  »■  1.28,  C(Y)  -  C(l)  -  .7356 

(from  binomial  reliability  table  with  n  -  15,  y  ■  .90)  and,  as  before,  k(.9Q, 

15)  ■  1.17132.  Thus,  the  MDF  .90  confidence  lower  bound  estimate  for  1  -  F(3.2) 


310 


2356 

is  1  -  1 . 171 32  (—-"■)  «  .7844.  (The  corresponding  binomial  estimate  would 
be  .7644.) 

3.  APPLICABILITY  OF  THE  TECHNIQUE.  Whether  the  MDF  technique  should  be 
applied  to  a  given  problem  depends  on  the  extent  to  which  the  analyst  can 
Justify  the  necessary  assumptions  regarding  the  problem  and  the  distribution 
function  involved.  Recall  that  Proposition  1  requires  that  the  density 
function  F'(x)  be  monotone  nondecreasing  on  the  interval  [0,X^+^].  This 

is  actually  a  stronger  requirement  than  necessary  for  the  validity  of  Proposition 
1  -  it  is  noted  in  the  proof  of  Proposition  1  (Appendix)  that  the  monotonicity 

^(Y+l) 

requirement  is  only  used  to  guarantee  that  F(X(y+l)^  —  '  F(T);  however, 

this  inequality  is  valid  for  a  much  larger  class  of  distributions  than  that 
characterized  by  the  monotonicity  (nonJecreasing)  of  the  density  function. 

Since  no  simple  charact  'zation  of  the  more  general  class  of  distributions 
appears  to  exist,  the  purposes  of  this  paper  are  best  served  by  confining 
attention  to  class  of  distributions  with  density  functions  which  are  monotone 
nondecreasing  on  [ 0 , X ^ ^ ’ 

In  a  particular  application  then,  the  analyst  must  be  able  to  justify  the 
use  of  the  assumption  that  f(x)  is  monotone  nondecreasing  on  [0,X^Y+^]. 

Indications  are  that  the  technique  is  fairly  insensitive  to  other  than  serious 
departures  from  the  assumption,  and  therefore  that  a  relatively  loose  or 
insensitive  justification  technique  can  be  employed.  Unfortunately,  there 
appears  to  be  no  specific  test  of  the  hypothesis  that  the  density  function 
F'(x)  is  monotone  nondecreaaing  on  [QjX^+jj]  available  at  present.  It  is 

possible  that  adaptations  of  certain  existing  tests,  such  as  the  test  for 
a  nondecreasing  failure  rate  proposed  by  Proschan  [3],  may  lead  to  a  suitable 
test  for  MDF  applications.  This  possibility  is  being  investigated. 

An  approach  which  seems  reasonable  in  view  of  the  apparent  insensitivity 
of  the  MDF  technique  to  departures  from  the  monotonicity  assumption  is  the 
following:  On  the  basis  of  the  data,  one  selects  a  known  distribution  function 
Fq(x)  which  appears  to  be  a  reasonable  candidate  for  the  true  distribution 

function  of  the  random  variable  involved  -  and  is  reasonably  representative 
of  the  data  over  the  Interval  (0,X^+1^].  Thus,  For  example  the  analyst  might 

decide  that  a  normal  distribution  with  y  and  a  equal  to  the  sample  mean  and 
sample  variance  respectively  Is  not  an  inappropriate  selection;  again,  one 
might  choose  a  Weibull  and  estimate  the  parameters  graphically.  Having  selected 
P  (x),  one  then  would  apply  the  Kolmogorov-Smirnov  test,  using  the  selected 

distribution  function  in  the  null  hypothesis:  Hq:F(x)  •*  Fq(x),  If  the  null 

hypothesis  is  not  rejected,  and  if  the  selected  function  Fq(x)  has  a  monotone 

nondecreasing  first  derivative  (F'0(x))  on  [0,X^+j_j]»  then  one  concludes 

that  it  is  possible  to  apply  the  MDF  technique  to  estimate  the  reliability. 

Note  that  application  of  the  MDF  technique  in  this  case  is  less  hazardous 
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(more  conservative)  than  using  the  hypothesized  distribution  function  Fq(x) 

to  estimate  the  reliability.  Further,  the  MDF  technique  is  a  much  simpler 
method  to  apply  than  classical  approaches  which  involve  the  hypothesized 
distribution  function  Fq(x)  in  the  sense  that  one  does  not  need  to  be 

concerned  about  estimation  of  the  parameters  in  Fq(x)  and  the  associated 

problems  encountered  in  obtaining  lower  confidence  bounds  in  terms  of  the 
estimators. 


Example  3.  For  the  data  from  Example  1,  suppose  it  is  hypothesized 


that  the  distribution  function  for  this  data  ia  normal  with  p  *  15.5,  and 
2 

o  ■  40.4  (15.5  and  40.4  are  the  mean  and  variance,  respectively,  of  the 
sample.)  The  Kolmogorov-Smirnov  test  of  the  hypothesis  Hn  •  F(x)  -  Fn(x), 
where  J 


F0(x) 


(x-15.5)2 
80.8  * 


would  not  reject  at  any  reasonable  level  of  significance  (max 

| Fn  (x ^ j ) )  -  Fq  ( j  )  I  ■  .179;  the  critical  valtie  for  ot  ■  .20  is  .266).  New 

X(Y+1)  "  5,8.  Since  F'g(x)  ■  *"q(x)  ia  monotone  nondecreasing  on  [0,  15.5], 

it  is  not  unreasonable  to  proceed  as  if  F'(x)  is  monotone  nondecreasing  at 
least  on  [0,  5.8]  ■  [0,  x^y+i)^  an<*  t0  aPP^y  the  MDF  technique  to  estimate 

the  reliability  for  the  mission  T  ■  3.2  hours,  as  has  been  done  in  Example  1. 

To  the  author's  knowledge,  there  are  no  readily  adaptable  goodness  of 
fit  type  tests  available  for  the  situation  in  which  censored  data  is  involved. 

In  this  case,  Justification  for  use  of  the  MDF  technique,  or  any  other  tech¬ 
nique,  must  necessarily  be  based  on  past  experience,  on  examination  of  the 
censored  dAta  and,  to  a  large  extent,  on  faith. 

The  next  section  discusser  seven  Monte  Carlo  studies  which  were  conducted 
to  investigate  the  behavior  of  the  MDF  estimator.  Three  populations,  Weibull, 
Uniform  and  Exponential,  were  considered.  The  density  function  for  the  Weibull 
population  was  monotone  increasing  on  the  interval  [0,  ?  ^  >  the  density 

function  for  the  Uniform  distribution  is,  of  course,  monotone  nondecreasing 
on  its  whole  domain.  However,  the  density  function  for  the  Exponential  distribu¬ 
tion  is  monotone  decreasing  on  its  whole  domain,  so  that  the  MDF  technique  is 
only  an  approximate  technique  for  this  case.  It  will  be  seen  that,  lnspite 
of  the  departure  of  the  exponential  case  from  the  MDF  requirement  (monotone 
nondecreasing  density  function),  the  MDF  technique  generally  provided  acceptable 
results  in  the  two  Exponential  studies  conducted. 

4.  MONTE  CARLO  STUDIES .  In  order  to  obtain  an  indication  of  the  behavior  of 
the  MDF  estimator  and  of  the  sensitivity  of  the  MDF  technique  to  departure 
from  the  raonotonicity  assumption,  seven  Monte  Carlo  studies  were  conducted 
as  follows:  Two  studies  were  based  on  sampling  from  a  Weibull  population 
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with  distribution  fuuui.luu  F(x)  =  1  c  X  (i.e.  F'C*)  r*  Xfix  e 
X  =  .005,0  =  2).  The  mission  considered  was  T  -  3.2;  since  F(3.2)  *■  .05, 
the  mission  reliability  was  1  -  F(T)  »  .95.  For  Weibull  1.  100  sets  of 
15  observations  each  were  obtained;  for  Weibull  2,  100  sets  of  30  observa¬ 
tions  each  were  obtained. 

Three  studies  were  based  on  sampling  from  a  population  with  the  Uniform 
distribution  on  [0,1].  For  Uniform  1  and  Uniform  2,  the  mission  was  T  *  .05 
so  that  the  true  reliability  was  .95,  Uniform  1  consisted  of  100  sets  of 
n  »  10  observations;  Uniform  2  consisted  of  100  sets  of  n  ■  20  observations. 

For  Uniform  3,  the  mission  was  T  -  .15  so  that  the  true  reliability  was  .85; 

100  sets  of  n  *  10  observations  were  drawn  for  Uniform  3. 

Finally,  two  studies  were  based  on  sampling  from  an  Exponential  popula- 

_  H'lx  y 

with  distribution  function  F(x)  =  1-e  ‘  (i.e.  F’ (x)  -  Xe  ,  X  ■  .01). 

The  mission  considered  was  T  ■  5.129;  as  with  the  Weibull  studies,  since 
F(5.129)  “  .05,  the  reliability  was  .95.  For  Exponential  1,  100  sets  of 
n  ■  20  observations  were  obtained;  for  Exponential  2,  100  sets  of  n  *  45 
observations  were  obtained. 

For  each  set  of  observations  in  each  of  the  seven  studies,  the  MDF  .90 
lower  confidence  estimate  was  obtained  as  described  in  Section  3.  Further¬ 
more,  for  each  set  of  observations  in  each  study,  the  binomial  .90  lower 
confidence  bound  was  determined.  The  results  obtained  by  these  two  methods 
of  estimation  are  compared  in  summary  form  in  Tables  2  and  3.  Also,  for 
each  set  of  observations  in  each  study  the  MDF  estimate  was  compared  with  the 
binomial  estimate  for  proximity  (at  the  third  decimal  place)  to  the  true 
reliability.  The  results  of  this  proximity  evaluation  are  presented  in 
Table  4. 

From  Tables  2,  3  and  4  it  is  seen  that  use  of  the  MDF  technique 
resulted  in  substantially  better  estimates  of  the  true  reliability  than  did 
the  binomial  method  in  the  five  cases  with  the  smallest  sample  sizes:  Weibull 
1  (n«15) ,  Uniform  1  (n-10),  Uniform  2  (n»20),  Uniform  3  (n»10)  and  Exponential 
1  (n“20) .  Further,  in  none  of  these  cases  did  the  observed  proportion  of 
errors  (estimates  in  excess  of  the  true  reliability)  made  using  the  MDF 
technique  exceed  the  allowable  .10,  despite  the  fact  that  the  MDF  technique 
is  only  an  approximate  technique  for  the  exponential  case.  Also,  the  magnitude 
of  the  errors  was  relatively  small  in  general,  as  indicated  by  the  proximity 
of  the  error  median  to  the  true  reliability  in  each  case. 

Consider  now  the  results  of  Weibull  2  and  Exponential  2:  Tables  2,  3  and 
4  show  that  although  the  superiority  of  the  MDF  technique  is  not  as  pronounced 
in  these  cases  as  with  the  three  smaller  3ample  cases,  it  is  nevertheless 
evident;  further,  Table  3  and  4  indicates  that  the  MDF  provides  better  estimates 
in  these  cases  (Weibull  2,  Exponential  2)  often  enough  to  justify  at  least 
calculating  the  MDF  estimate  to  determine  whether  it  gives  a  larger  value  than 
the  corresponding  binomial  estimate. 

The  MDF  technique  led  to  12  erroneous  estimates  (>.95)  in  Exponential  2. 
Although  this  proportion  exceeds  1  -  v  *t  .10,  It  is  seen  from  Table  2  that 


313 


the  degree  of  departure  from  the  r-m«  reliability  is  wi.  excessive  in  three 
cases  (.952,  .953,  .953).  Further,  only  5  of  the  erroneous  estimates 
exceeded  the  true  reliability  by  more  than  .01.  Again,  It  is  pointed  out. 
that  the  MDF  technique  is  ur.ly  approximate  for  the  Exponential  case.  However, 
it  is  clear  that  the  aore  nearly  the  Exponential  distribution  function  involved 
is  approximated  by  a  Uniform  distribution  function  over  the  range  of  interest 
(i.e.  over  the  Interval  the  smaller  will  be  the  chance  of  obtaining 

erroneous  estimates. 

To  provide  an  indication  of  how  the  MDF  technique  compares  with  two  other 
commonly  used  estimation  techniques,  the  following  studies  were  conducted: 

1)  For  each  set  of  observations  in  Welbull  1  and  Exponential  1,  a  .90  confi¬ 
dence  lower  bound  (for  the  reliability)  was  obtained  using  the  method  described 
by  Epstein  in  [4]  (which  assumes  Exponentiality)  for  the  non-replacement 
situation  with  data  censored  at  the  third  order  statistic.  2)  The  same  technique, 
with  data  censored  at  the  fifth  order  statistic  was  used  to  obtain  a  .90 
confidence  lower  bound  (for  the  reliability)  for  each  set  of  observations  in 
Exponential  2.  The  results  of  studies  1)  and  2)  are  summarized  in  Table  5. 

3)  For  33  randomly  selected  trials  from  Welbull  1  and  32  randomly  selected 
trials  from  Welbull  2  the  technique  described  by  Johns  end  Lieberman  in  [5], 
with  data  censored  at  the  seventh  order  statistic,  was  used  to  obtain  .90 
confidence  lower  bounds  for  the  reliability.  The  results  of  these  studies  are 
summarized  in  Table  6. 
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TABLE  1.  VALUES  OP  k(y,n)  FOR  SELECTED  CONFIDENCE 
LEVELS  Y  AND  3 AMPLE  SIZES  n 


sA 

.  a90 

•  95 

•  99 

5 

1.13222 

1.10195 

1.05967 

6 

1.14160 

1.11117 

I.O6796 

7 

1.15098 

1.12040 

1  07787 

8 

1.15595 

1.12583 

1.08310 

9 

1.16093 

1 . 13125 

I.O8958 

10 

1.16399 

1.13369 

1.09343 

11 

1.16706 

1.13613 

1.09729 

12 

I.I6836 

1.13857 

1.09907 

13 

1.16956 

1.14101 

1.10086 

14 

1.17049 

1.14198 

1.10194 

16 

1.17215 

1.14393 

1.10441 

18 

1.17382 

1.14588 

1.10691 

20 

1.17548 

1.14782 

1.10923 

22 

1.17644 

1.14887 

1.11053 

24 

1.17739 

1.14992 

1.11179 

26 

1.17834 

1.15097 

1.11287 

28 

1.17910 

1.15178 

1.11337 

30 

1.17986 

1.15259 

1.11431 

35 

1.18177 

1.15462 

1.11691 

40 

1.18236 

1.15529 

1.11770 

45 

I.I8296 

1.15596 

1.11850 

50 

1.18320 

1.15626 

1.11874 

60 

1.18367 

1.15687 

1.11974 

70 

]  .18415 

1.15740 

1.12033 

80 

1.18477 

1,15807 

1.12089 

90 

1.18501 

1.15835 

1.12144 

100 

1.18525 

1.15861 

1.12176 

200 

1.18653 

1.16004 

1.12344 
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I  • 

TABLE  2.  (Continued) 


STUDY 

SAMPLE 

SIZE 

INTERVAL 

Uniform  1 

10 

( -448, .5503 

(.550, .6633 
(•663, .7943 
(-794, .8303 
(.850, .9003 
(.900, .9503 
(.950,1.003 


Uniform  2  20 

£  .639 

(.639,-696) 
(.696,. 755] 
(.755,. 8193 
( .819, .8913 
(.891,. 9503 
(-950,1.00] 

Exponential  20 

1 

£  -639 

( *639, .6963 
(.696, .7553 
(-755,. 8193 
(.819, .8913 

(.891,-950] 

(.950,1.00] 

MDF 


BINOMIAL 

ERRONEOUS 

MDF 

ESTIMATES 

8 

1 

•  953 

35 

5 

•955 

57 

23 

.962 

0 

20 

0 

23 

0 

25 

0 

3 

2 

1 

•  951 

2 

0 

•  952 

12 

1 

•952 

51 

19 

.956 

33 

33 

•  958 

0 

40 

-976 

0 

6 

0 

0 

•951  .5 

3 

3 

•  952 

21 

4 

.954 

36 

12 

.956 

40 

28 

.960 

0 

45 

•  962 

0 

8 

.969 
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TABLE  2.  (Continued) 


STrny _ 

SAMPLE 

SIZE 

INTERVAL 

BINOMIAL 

MDF 

HJDF 

ERRONEOUS 

ESTIMATES 

Exponential 

45 

£  .804 

4 

4 

.952 

.963 

I  2 

; 

1 — 1 

0 

cn 

CO 

0 

00 

6 

1 

.953 

.965 

! 

( .830, .858) 

18 

11 

.953 

.965 

( .858, .886] 

35 

18 

.956 

.967 

y 

i 

\ 

(.886, .916] 

25 

26 

•  957 

.972 

1 

\ 

( .916, .950] 

12 

28 

.960 

i- 

[ 

(.950,1.00] 

0 

12 

.960 

|  Uniform  3* 

10 

£  .354 

4 

1 

.859 

1 

i 

1 

J 

(.354,-448] 

20 

5 

.863 

(.448,-550] 

24 

13 

.865 

i 

( .550, .663] 

35 

31 

.865 

I 

1 

( .663, .794] 

17 

29 

.885 

! 

I 

( .794, .850] 

0 

16 

(.850, .900] 

0 

5 

(.900,1.00] 

0 

0 

*The  true  reliability  in  this  case  was  .850. 

i 


f 

\ 

1 

1 

i 


i 

f 
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TABLE  3-  SUMMARY  COMPARISON  OF  MDF  AND  BINOMIAL  LOWER  .90  CONFIDENCE 
BOUNDS:  MEDIANS,  MEANS,  ERROR  PERCENTAGES  AND  ERROR  MEDIANS 
FOR  100  SETS  OF  OBSERVATIONS  FOR  EACH  STUDY 
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Recall  that  the  true  reliability  in  this  case  was  .85*  In  all  other  cases,  the 
true  reliability  was  .95- 


TABLE  4.  PROXIMITY*  COMPARISON  OP  MDF  AND  BINOMIAL 
LOWER  .90  CONFIDENCE  ESTIMATES 


- -  TECHNIQUE 

STUDY  - _ _ 

MDF 

BINOMIAL 

TIE 

Welbull  1 

74 

25 

1 

Welbull  2 

59 

40 

1 

Uniform  1 

86 

13 

1 

Uniform  2 

80 

19 

1 

Exponential  1 

85 

12 

3 

Exponential  2 

55 

37 

8 

Uniform  3 

67 

31 

2 

•For  example,  in  Welbull  study  1,  the  MDF  technique  gave  an 
estimate  which  was  closer  to  the  true  reliability  than  the 
corresponding  binomial  estimate  in  74  instances,  the  binomial 
method  gave  the  closer  estimate  in  25  oases  and  there  was  one 
tie. 
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*Ali  erroneous  estimates  are  based  on  extrapolation  in  Table  1  or  Z5l  • 


APPENDIX 


This  section  presents  a  proof  of  Proposition  1  upon 
which  the  MDF  estimation  technique  is  based.  Proposition 
2,  which  is  proved  in  Reference  1,  is  restated  here  for 

i 

reference  ease . 

Proposition  2.  Let  a  continuous  nonnegative  random 

variable  X  have  probability  density  function  F'(x)  and 

!  distribution  function  P(x) .  Suppose  that  F'(x)  is  mono- 

i  tone  nondecreasing  on  an  interval  [0,5p].  If  m  ^  1  and 

i  MTc[0,&p],  then  P(MT)  ^MF(T). 

Proposition  1.  Let  Y  be  the  number  of  mission 

failures  in  n  trials.  Let  C(Y)  be  the  solution  to 

I  (?)[C(Y)]J[1  -  C(Y)]n’J  -  1-Y)  let  M(Y)  -  ..lYtll  , 
j  J-0  1  T 

where  T  is  the  mission;  and  let  k(Y,n)  be  determined 
,  by  the  equation 

S#(J)C»e(Y,n)0(0)3fti  -  k(Y,n)C(f)]n"f  •-  1  -  y, 

f-0  1 

where  n*  £  n,  1  -  k{Y>n)c(n*)  >  0  and  1  -  k(Y,n)C(n*  +  l)  £  0. 
j  If  f(x)  is  monotone  nondecreasing  on  [0,X^y+^j],  then 

Pr(l  -  k(Y»«)(§(Yj-)  £  1  -  *(*))  2.  Y- 

j  Proof.  We  Bhow  that  Pr{l  -  1c(y>*i)  £  1  “  p(^)}  £  1  “  Y* 

|  i 

I  ; 

|  ! 

!  i 

i  1 
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We  have  that  1  -  k<Y,n) i  1  -  f(T)  -  X  -  §|y|’  £ 

1  '  P<T>  +  tl  -  ■  let  X  -  -  k».  Thu., 

the  probability  that  the  MDF  estimator  actually  exceeds 

iiU.  i  .Vi  <1  4  i. _  J  —  *  1  ie.  LUea  1  J  1..  1  C(Y) 


the  reliability  is  equal  to  the  probability  that  1  -  H(Yy 
exceeds  the  reliability  by  an  amount  not  less  than  k*F(T) . 
Thus,  we  want  to  show  that  Pr{l  -  ;>  1  -  F(T)  +  k*F(T) }  £ 


1  -  y*  To  do  this,  we  consider  two  cases:  Case  It  F(T)  £ 
k(Y,n)C(0)j  Case  2:  F(T)  £k(Y,n)C(0). 

Case  1.  F(T)  k(Y,n)c(o). 


Pr[l  - 


£  1  -  F(T)  +  k*F(T)) 


-  Pr{0  failures  and  X^j  £ 

+  Pr{l  failure  and  ±  [KiX^^iljT] 
+  Pr{2  failures  and  X,,x  ^ 


k(Y,n)C  11 


+  Pr(n»  failures  and  X(n*}  ^  )-]T} 


*K(v  <n)C(_n) 


+  Pr{n  -  1  failures  and  X^  > 

Now,  Pr[Y  failures  and  X(y+1j  £  [k(v*g| jjffijT} 

-  Pr{Y  failures  and  F{X(V+1j)  £  F(Ck-^^^j^-3))  • 
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Since  F(T)  £k(Y,n)C(0),  F(T)  £  k(v,n)C(Y)  for  Ye £0, 1,2, 

. Thus,  K(y»^C(X1  >  i,  so  that  by  Proposition  2 

T)  ] •  F(T)  =  k(Y,n)c(Y).  Therefore 

Pr{Y  failures  and  F(X(Y+1p  £  F([^X^^-]T)} 

£  Pr{Y  failures  and  F(X^y+1j)  ^  k(Y,n)C(Y)} 

"  Pr*Y  Allures ‘and  X(y+l)  ^  Sk(Yjn)c(Y)3 

=  (5)CF(T)]Y[1  -  k(Y,n)C(Y)]n“Y> 

when  k(Y,n)C(Y)  £  1(1. e.  for  n  £  n*).  If  k(Y,n)c(Y)  >  1 
(i.e.  for  n  >  n*),  Pr{Y  failures  and  p(X^n-Yj)^ 

F([^-^^]T))  «  0.  Hence, 

Pr{!  -  £  1  -  F(T)  +  k*F(T)]  £ 

n* 

2  (?)[F(T)]y[l  -  k(Y,n)C(Y)]n"Y. 
f=0  1 

Now,  F(T)  £k(Y,n)C(0).  Thus, 

^i||i 

2  (?)[F(T)]Y[1  -  k(Y,n)C(Y)]n_Y  £ 
f«=0  1 


2  (?)[k(Y,n)C(0)]Y[l  -  k(Y,n)C(Y)]n'Y. 
f=0  1 

By  hypothesis,  the  sum  on  the  right  equals  1  -  Y5  thus  v/hen 
F(T)  £  k(Y,n)C(0), 


Pr{l  -  £[!|  >  1  -  F(T)  +  k*F(T)} 

*  Pr{l  -  k(Y,n)(^[||  ^  1  -  F(T)}  £  1  -  Y- 

Case  2.  F(T)  >  k(Y,n)C(0).  In  this  case  there 
exists  f  =  f*  such  that  F(T)  £  k(Y,n)C(f*)  and  F(T)  > 
k(Y,n)C(f#  -1).  We  will  thus  have  Pr{l  -  ||y|- ^ 

1  -  F(T)  +  k*F(T)}  «  Pr{f*  or  fewer  failures  ]  + 

Pr{f*  +  1  failure  and  X^f+2j  £  [^-Y^^j-f~^-3T}+  ...  + 

Pr{n*  failures  and  X(n*+1) 

=  E  (?)CP(T)]fCl  -  F(T)]n'-f  +  E  (?)[P(T)Jf> 
f«0  1  f-f*+l  1 

[1  -  k(Y,n)C(f)]n"f*  H(F(T)).  The  maximum  value  the 

second  sum  In  H(F(T))  can  take  on  occurs  for  F(T)  * 

k(Y,n) -C(f*),  since  F(T)  £  k(Y,n)C(f)  for  f  ^  f*.  Thus, 

H(F(T) )  is  dominated  by  the  function  0(f(T)),  where 

0(F(T))  -  E  (?)[P(T)]f[l  -  F(T)]n"f  +  E  ( J) ' 
f»0  1  f~f*+l  1 

[k(Y,n)C(f*)]f[l  -  k(Y»n)C(f 

Furthermore, 

“  “  n^1  -  fCt)]11"1  +  E  (J) { f P(T) r“1Cl  -  P(T)]n“f 
-  (n  -  f)F(T)f[l  -  F(T)3n“f“1} 


«  -n[l  -  P(T  )fl~1  +  £  (?)[P(T)]f_1[l  -  F(T)]n“f“1{f  -  nP(T) )  . 

f=l  1 

Now  recall  that  for  fe{0,l,2, . • • , f*}  F(T)  ;>  k(Y,n)C(f*)> 
and  C(f*)  ^  f*/n*  Thus,  since  k(Y,n)  ^  1,  nP(T)  £  »  f*, 

ao  that  f  -  nP(T)  <  0  for  feCOjl, 2, . . . ,f*} .  Hence 

is  negative;  i.e.  H(P(T))  is  dominated  by  the 
monotone  decreasing  function  G(F(T)).  The  value  of  o(P(T)) 
vjhen  F(T)  =  k(Y,n)C(0)  is 

Q(k(Y,n)C(0))=  I  (?)[k(Y,n)C(0)3f[l  -  k(Y,n)C(0)3n'f 
f-0  1 

+  E  (?)[k(Y,n)C(f*)]f[l  -  k(Y,n)C(f)]n“f. 
f-f*+l  1 

However,  It  is  clear  that  when  F(T)  »  k(Y,n)c{0),  f*  »  0, 

n# 

so  that  Q(k(Y,n)C(0))  -  £  (?)[k(Y,n)C(0)]f[l  -  k(Y,n)C(f ) 3n"f 

f-0 

Thus  H(F(T))  £  a(k(Yfn)C(0)) .  By  hypothesis,  0(k(Y,n)C(0))  « 

1  -  Y,  and  the  theorem  is  proved. 
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THE  DEVELOPMENT  OF  A  PROBABILISTIC  MODEL  TOR 
ACOUSTIC  SOUND  HANGING 


Robert  P.  Lee 

Atmospheric  Sciences  Laboratory- 
White  Sands  Missile  Range,  New  Mexico 


It  was  decided  last  year  that  the  Environmental  Sciences  Division 
of  the  Atmospheric  Sciences  Laboratory  at  White  Sands  Missile  Range 
would  study  the  possibility  of  improving  acoustic  sound  ranging  by 
applying  more  elaborate  meteorological  corrections. 

Accordingly,  a  very  elaborate  test  was  set  up.  Charges  consist¬ 
ing  of  two  and  one  half  pounds  of  TNT  were  to  be  exploded  at  twenty 
minute  intervals  and  the  resulting  acoustic  waves  picked  up  by  eight¬ 
een  broadband  microphones  placed  so  as  to  give  various  microphone 
configurations.  The  outputs  from  these  microphones  were  fed  to  three 
magnetic  tape  recorders,  eight  oigials  to  one,  five  signals  to  each 
of  the  other  two.  The  locations  for  the  microphones  were  laid  out 
very  carefully  and  a  final  first  order  survey  run  to  precisely  de¬ 
termine  the  microphone  coordinates. 

It  was  felt  that  the  microphone  coordinates  were  not  in  error 
by  more  than  one  foot  with  respect  to  each  other  and  to  the  firepoint. 
By  digitizing  the  analog  tapes  at  one  millisecond  intervals,  since 
sound  propagates  approximately  one  foot  per  millisecond,  the  timing 
and  microphone  location  errors  should  be  of  the  same  order  of  magnitude. 
It  was  hoped  that  this  would  remove  timing  and  microphone  placement 
errors  as  BourceB  of  error  but,  if  necessary  the  digitizing  rate  could 
be  increased  ten-fold  and  more  sophisticated  methods  could  be  used  to 
compensate  for  microphone  placement  errors. 

Figure  1  shows  the  geometry  of  a  six  microphone  acoustic  ranging 
array  and  the  equations  to  be  solved.  A  derivation  of  these  equations 
is  eiven  in  the  Appendix  to  "Probabalist-ic  Model  for  Acoustic  Sound 
Ranging" ,  ECOM-5159,  October  1967  by  the  author.  Briefly,  assuming 
the  wave  front  to  be  a  plane  wave  at  ten  degrees  centigrade  with  no 
wind,  the  time  difference,  tj.,  in  acoustic  signal  arrival  times  at 
two  adjacent  microphones  divided  by  s,  the  acoustic  travel  time  bet¬ 
ween  these  adjacent  microphones,  gives  the  sine  of  the  angle,  ©i, 
between  the  normal  to  the  plane  wave  and  the  normal  to  the  microphone 
array.  Any  two  of  these  rays  can  then  be  solved  for  intersection 
point.  Five  such  rays  will  intersect  at  ten  points.  The  average  of 
these  ten  points  will  give  a  preliminary  location  from  which  approxi¬ 
mate  distances,  Ri,  to  the  midpoints  between  the  microphones  can  be 
calculated.  Next,  corrections  for  wind,  temperature,  and  wave  front 
currature  are  applied  to  the  t^  and  the  confutations  repeated.  For 
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IDOSt  of  "the  area  In  frftnt  r\T  *hVu»  5  *****  4«  Sufficient • 

As  the  tapes  were  digitized,  plotted,  aad  first  notion  read,  it  be¬ 
came  evident  that  all  was  not  veil.  Table  1  snows  a  tabulation  of  the 
differences  in  signal  arrival  tines  at  the  five  microphones  (not  in  a 
straight  line)  recorded  at  Van  6015 •  As  mentioned  above  the  analog  sig¬ 
nals  were  digitized  at  1000  sasples  per  second  and  first  motion  was,  in 
general,  well  defined.  It  was  estimated  that  the  one  sigma  time  reading 
error  should  fall  between  one  and  two  milliseconds.  This  is  almost  an 
order  of  magnitude  less  than  the  seven  to  ten  milliseconds  rms  variation 
shown  in  Table  1.  Table  2  summarizes  the  recordings  made  at  another  five 
microphones  during  the  same  period  of  time  at  Van 6026.  Again  the  rms 
variation  is  too  great.  In  addition,  a  statistical  analysis  will  show 
the  essential  randomness  of  these  residuals. 

A  few  moments  reflection  will  show  that  nothing  we  have  presented 
so  far  will  account  for  these  results.  The  firing  point  and  microphone 
locations  were  fixed  and  errors  in  these  vould  cause  no  variation  in 
the  time  recordings.  Temperature  changes  vould  apply  equally  to  all 
parts  of  the  acoustic  wave  front  and  variations  from  microphone  to 
microphone  should  be  highly  correlated.  The  same  should  be  true  if 
the  wind  across  the  array  changed.  During  the  time  this  data  was  as¬ 
sembled,  acoustic  ray  trace  calculations  were  underway  to  find  out  if 
the  observed  time  differences  could  be  checked  by  this  method  and  to 
study  the  effect  of  the  ray  path  on  the  meteorological  corrections  to 
be  applied.  Very  typical  results  are  shown  in  Table  3*  Although  com¬ 
plete  horizontal  stratification  was  assumed  in  the  winds  and  temperatures, 
a  common  assumption  in  acoustic  ray  tracing,  due  to  the  large  azimuth 
changes  from  one  ray  to  another  and  varying  distances, each  ray  travels 
a  path  characterized  by  different  parameters.  The  effective  temperature 
for  each  path  was  obtained  by  dividing  the  distance  from  the  fire  point  to 
the  microphone  by  the  time  as  shown  by  acoustic  ray  tracing  for  the 
sound  to  propagate  between  these  two  points  and  then  determining  a 
mean  temperature  based  on  this  propagation  velocity.  Similarly,  the 
wind  displacement  normal  and  parallel  to  the  baseline  for  each  incre¬ 
ment  of  ray  path  was  summed  and  divided  by  the  elapsed  time  to  give 
the  effective  wind  components. 

Reexamining  the  equations  of  Figure  1  with  the  data  from  Table  3 
in  mind,  it  can  be  seen  that  implicit  in  these  equations  is  the  assump¬ 
tion  that  there  exists  a  unique  tesperature  and  a  unique  wind  velocity 
vector  valid  for  the  entire  area  in  front  of  the  microphone  array  and 
that  if  these  were  exactly  known  proper  corrections  could  be  made  for 
wind  and  temperature.  Table  3  Indicates  that  in  addition  to  errors  in 
estimating  the  effective  temperature  and  effective  wind  component  paral¬ 
lel  to  the  baseline  (the  wind  component  normal  to  the  baseline  does  not 
appear  in  the  equations)  and  to  random  timing  errors  and  microphone  place¬ 
ment  errors,  there  exists  errors  due  to  the  variations  in  effective 
tesperature  and  in  both  components  of  the  effective  wind  velocity  vector 
freo  rsy  p4th  to  ray  path. 
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The  ratio  of  the  rms  acoustic  source  location  error  in  meters  to 
the  rms  gauseian  timing  error  in  milliseconds  was  examined  by  choosing 
a  point  in  the  XY  plane ,  calculating  the  acoustic  propagation  time  to 
each  microphone,  adding  to  this  elapsed  time  a  gauseian  random  number 
(with  zero  mean  and  specified  standard  deviation),  substitulng  the  per¬ 
turbed  times  into  the  sound  ranging  equations,  and  calculating  the 
apparent  acoustic  source  location.  After  storing  the  source  location 
errors  in  X  and  Y  and  the  timing  errors,  a  new  set  of  timing  errors 
was  drawn  and  the  process  repeated  until  100  samples  had  been  produced. 
The  ratio  of  the  rms  output  error  to  the  rms  input  error  was  then 
determined  for  that  point  in  the  XY  plane .  Calculations  vere  made 
at  sufficient  points  to  permit  plotting  the  error  contour  curves 
shown  in  Figure  2. 

The  effect  of  microphone  placement  errors  was  examined  by  gener¬ 
ating  two  random  numbers  for  each  microphone  location,  one  uniformly 
distributed  between  0  and  3^0  to  represent  the  angle  with  the  baseline, 
the  other  with  a  gauss  ian  distribution  to  represent  the  magnitude  of 
the  microphone  placement  error.  Microphone  positions  corresponding  to 
these  errors  in  placement  were  calculated  as  were  the  acoustic  arrival 
times  to  these  new  locations.  These  arrival  times  were  than  substituted 
into  the  array  equations,  where  they  were  used  as  if  the  microphone  were 
at  the  original  locations.  33m  resulting  acoustic  source  loeaton  errors 
were  stored  as  well  as  the  input  microphone  placement  errors  and  the  pro¬ 
cess  repeated  until  a  sample  of  100  had  been  determined.  The  ratio  of 
the  rms  acoustic  source  location  error  to  the  rms  microphone  placement 
error  was  determined  for  enough  points  to  permit  drawing  the  contours 
of  Figure  3. 

The  equations  for  sound  ranging  assure  that  there  is  an  effective 
mean  temperature  over  the  entire  area  in  front  of  the  array.  Since 
temperature  is  a  single  number  applied  to  the  entire  array,  a  bias 
type  of  error  results.  It  is  only  necessary  to  make  a  single  compu¬ 
tation  at  any  point  in  the  XY  plane  to  determine  the  error  in  acoustic 
source  location  per  degree  centigrade.  This  error  has  both  magnitude 
and  direction.  The  error  contours  of  Figure  4  are  based  on  magnitude 
only.  There  will  also  be  a  unique  direction  associated  with  each  point 
in  the  plane  but  no  attempt  has  been  made  to  plot  this. 

If  it  is  assumed  as  indicated  in  Table  3  that  each  ray  from  the 
acoustic  source  to  a  microphone  encounters  slightly  different  atmoephe- 
ric  conditions,  then  there  will  be  small  variations  from  ray  path  to 
ray  path  in  effective  temperature.  To  obtain  the  error  contours  of 
Figure  5,  the  mean  temperature  of  10°C  for  the  entire  array  was  per¬ 
turbed  for  each  ray  be  adding  a  random  variation  drawn  from  a  gauseian 
population  having  a  mean  zero  and  standard  deviation  of  .1  OC.  Acoustic 
arrival  times  based  on  these  perturbed  temperatures  were  than  plugged 
back  into  the  sound  ranging  equations  and  the  error  in  acoustic  source 
location  determined.  Again  a  sample  of  100  such  calculations  for  a 
given  point  resulted  in  a  reasonably  stable  ratio  of  rms  error  to  rms 
input  error. 


Aa  with  temperature ,  the  equations  for  sound  ranging  assume  there 
is  aa  effective  mean  wind  over  the  area  between  the  acoustic  source  and 
the  microphone  array.  Bile  is  simplified  by  the  fact  that  only  the  wind 
component  parallel  to  the  baseline  enters  Into  the  correction  equations. 
That  the  wind  ccoponent  perpendicular  to  the  baseline  cancels  out  is 
shown  in  the  ECOM  Report  5159  references  earlier.  Since  only  a  single 
number  la  Involved  the  error  is  gias  type  error  having  magnitude  and 
direction  at  each  point  in  the  plane  in  front  of  the  microphone.  If 
an  error  of  one  meter  per  second  in  assumed  wind  is  used  the  error  con¬ 
tours  (magnitude  only)  of  Figure  6  result.  There  will  be  a  unique 
direction  for  each  point  in  the  plane .  Ho  attempt  has  been  made  to 
show  this. 

When  calculations  are  based  on  perturbed  effective  wind  velocity 
for  each  acoustic  ray  it  is  necessary  consider  variations  in  the  wind 
component  normal  to  the  baseline  as  well  as  variations  in  the  component 
parallel  to  the  baseline .  The  procedure  for  producing  the  error  calcu¬ 
lations  from  which  Figures  7  and  8  are  derived  resembles  those  using 
temperature  variations  except  that  now  the  random  numbers  drawn  repre¬ 
sent  component  wind  perturbations. 

Die  importance  of  Including  terms  representing  the  perturbations 
In  temperature  and  wind  velocity  over  the  area  in  front  of  the  micro¬ 
phone  array  can  be  seen  by  comparing  Figure  4  with  Figure  5  and  Figure 
6  with  Figures  7  and  8.  From  the  magnitude  of  the  contours  shown  is 
obvious  a  small  variation  from  ray  path  to  ray  path  In  mean  effective 
wind  or  mean  effective  temperature  will  have  considerably  greater  effect 
than  a  similar  error  In  estimating  the  mean  effective  vlnd  or  the  mean 
effective  temperature •  Figures  2  through  8  indicate  that  the  following 
parameter  variations  produce  approximately  equivalent  acoustic  source 
location  errors: 

(1)  5  millisecond  rma  timing  errors 

(2)  2.5  meter  rms  microphone  placement  errors 

(3)  1°C  error  estimating  effective  temperature 

(K)  ,ioc  rma  variation  from  one  path  to  another  in  effective 
temperature 

(5)  1  mater  per  second  error  in  estimating  the  component  of  tine 
effective  wind  parallel  to  the  baseline 

(6)  .2  roster  per  second  rma  variation  from  one  path  to  another 
in  the  component  of  the  effective  wind  parallel  to  the  base¬ 
line 

(7)  .1  meter  per  second  rms  variation  from  one  path  to  another  in 
the  component  of  the  effective  wind  normal  to  the  baseline. 

Since  effects  (4),  (6)  and  (7)  will  show  up  as  random  variations 
in  acoustic  arrival  times  at  each  of  the  microphones,  they  are  suffi¬ 
cient  to  account  for  the  random  variations  shown  in  Tables  1  and  2. 

Table  4  deserves  special  mention.  Here  three  of  the  mipmum  time 
rays  rose  to  a  peak  height  of.  about  40  meters,  the  other  three  to  a 
peak  height  of  approximately  150  maters.  With  the  large  differences 
in  effective  wind  and  effective  temperature  between  the  two  groups, 
the  equations  for  sound  ranging  have  no  valid  solution. 
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ON  EXPERIMENTS  CONCERNED  WITH  THE  SAMPLING  DISTRIBUTION 
OF  LANCASTER’ S  PARAMETERS 

David  R.  Howes 

U.S.  Army  Strategy  and  Tactics  Analysis  Group 
Bethesda,  Maryland 


Research  modeling  activities  of  the  past  five  years  have  led  to  the 
construction  of  a  number  of  computerized  war  gaming  models.  These  are 
simulations  of  combat  carried  on  In  more  or  less  detail,  and  carried  nut 
under  the  guidance  of  player  groups  who  prescribe  computer  input  both  at 
the  start  of  and  during  the  simulation.  Such  a  simulation  might  represent 
the  activities  of  opposing  Red  and  Blue  Divisional  forces  accounted  for  down 
to  the  company  and  battery  level  of  resolution.  The  simulation  proceeds  in 
accordance  with  player  input  and  assessment  rules  Internal  to  the  computer 
which  are  generally  probabilistic  in  nature. 

It  is  often  desired  to  use  such  gaming  models  to  compare  the  relative 
effectiveness  of  some  aspect  of  organization,  tactics,  equipment,  supply,  or 
armamsnt  of  a  combat  organisation.  The  procedure  would  be  to  game  two  or 
more  alternative  possibilities  and  then  somehow  to  compare  the  results  obtained 
from  the  game.  In  this  case,  the  complex  probabilistic  nature  of  these  models 
raises  questions  of  experimental  design,  since  the  outcome  of  a  given  game 
would  vary  over  repeated  trials. 

In  a  deterministic  war  game,  the  problem  of  variability  does  not  exist. 

For  example,  consider  the  Lanchester  "Laws"  of  combat.  These  are: 

1.  Lanchester 's  first  linear  law: 


Jt  '  k2 


In  this  case,  the  attrition  of  the  strength  of  the  force  x,  dx/dt 
Is  proportional  to  some  environmental  constant  k^,  unrelated  to  the  strengths 
of  the  combattants. 


2.  Lanchester 's  square  law: 


dx/dt  ■  kjy 
dy/dt  •  k2x 

Hire,  the  proportion  is  to  the  strength  of  the  opponent. 

3.  Lanchester 's  second  linear  law: 

dx/dt  ■  kjXy 
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! 

I  dy/dt  -  k2xy 

jf! 

;;  The  first  and  third  law  are  both  called  linear  since  In  both  cases 

i 

dx/dy  ■  (a  constant) 

therefore,  the  graph  of  x  against  y  is  linear. 

h  In  the  case  of  a  supposed  Lanchester  model  experimental  design  would  be 

|  simple.  It  is  only  necessary  to  play  a  game  through  two  time  periods.  This 
|  will  provide  the  data  necessary  to  calculate  k^  and  kg. 

I  However,  in  probabilistic  models  the  problem  is  far  more  complex.  No 

j!  explicit  mathematical  model  seems  capable  of  being  prescribed. 

In  a  recent  paper,  (1)  David  G.  Smith  has  explored  the  possibility  of 
\  relating  Lanchester  Theory  to  the  study  of  simulation  results.  The  idea  is 

l  to  suppose  that  the  underlying  process  of  warfare  (or  war  simulation)  is 

!  reasonably  depicted  by  a  Lanchester  formulation,  but  that  the  results  actually 

fc  observed  in  terms  of  measurable  characteristics  of  war  (in  Smith's  case, 

):  casualties)  are  taken  from  sampling  probability  distribution  around  the  pure 

Lanchester  form. 

| 

f  Giving  Smith's  results  for  the  linear  law;  using  the  sides  with  initial 

{  strengths  m  end  n: 

I  A(m,n)  Prob.  that  red  will  sustain  next  casualty. 

|  P(x,y,m,n)  Prob.  that  point  x,y,  is  reached  from  m,n. 

i 

P(R,m)  Prob.  of  red  win. 
for  dm/dt  ■  -  af^(m,n) 
dn/dt  «  -  efg(m,n) 

j  (1)  A(m,n)  -  pfgfa^/tfj^mjn)  +  pf2(m,n)]  »  P  ■  f 

or  in  the  case  of  the  linear  law 
’  (2)  A(m,n)  -  p/(l  +  p) 

and 


P(x,y,r,s)  -  A(x,y+1)  P(x,y+l,r,s)  +  1  -  A(x+l,y)  P(x+l,y,r,s)  (3) 


leading  for  the  linear  law: 

jm  +  a  -  x  -  y] 
P(x,y,m,n)  -I 

\  «  “x  j 


(4) 
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where  I  (n,m)  —  Incomplete  Beta  Functions  and  p  • 

P  lTp 

Smith  has  had  some  success  In  applying  these  formulations  to  the  results  of 
war  games. 

In  Smith's  case,  these  results  pertain  to  human  casualties  as  simulated. 
Supposing  these  to  be  of  primary  importance,  he  suggests  the  use  of  p  as  a 
measure  to  the  effectiveness  of  m  over  n.  Since  a  measure  of  military  effective¬ 
ness  is  of  great  importance  in  operations  research  it  is  of  interest  to  see 
whether  this  approach  can  be  extended  to  cover  situations  of  greater  complexity. 

The  first  difficulty  is  that  of  estimating  p  in  practical  cases.  Formula 
(5)  offers  an  apparent  method  since  there  the  problem  is  one  of  a  familiar 
distribution  parameter.  H.  Weller  in  reference  2  provides  methods  of  estimation. 
However,  this  requires  that  the  simulation  continue  until  a  side  is  annihilated. 
Except  in  small-unit  cases,  this  does  not  occur. 

Equation  (4)  might  seem  to  offer  possibilities  since  it  is  independent 
‘"’f  the  final  outcome  and  might  be  amendable  to  the  sort  of  treatment  prescribed 
for  Bernoulli  Frocces  in  Raffia  and  Schlaif far  (Reference  3) .  The  details  need 
to  be  worked  out. 


(m  +  n  -  l\  n8 

\  8  J  ~'i+£T  "  Vm,n) 


Considerable  practical  difficulty  surrounds  the  construction  of  an 
experimental  plan.  Using  an  existing  computerised  war  gaming  model,  such  as 
STAG’S  LEGION  model,  a  very  great  variety  of  initial  conditions  can  be  arranged. 
If  these  can  be  standardised  on  some  basis,  the  problem  might  be  reduced  to  one 
as  shown  on  the  diagram. 


n 


The  space  spanned  by  the  m  and  n  axes  represents  the  strength  of  the  m  and  n 
forces,  A  point  in  the  space  such  as  P,  represents  a  starting  point  for  a 
battle.  The  line  leading  from  P,  represents  a  possible  course  of  battle, 
subsequent  strengths  of  the  two  sides. 

If  either  of  Lanchester's  "linear"  lava  were  a  valid  representation  of 
the  process,  the  straight  line  leading  from  P$  would  be  descriptive.  Also, 

the  initial  point  chosen,  P,  would  be  immaterial  since  the  laws  of  all  possible 
battles  would  be  a  family  of  parallel  lines.  (Battles,  that  is,  between 
opponents  with  relative  effectiveness).  In  this  case,  it  would  be  sufficient 
to  obtain  an  estimate  of  p,  the  slope  of  the  line.  A  least  squares  estimate. 


based  on  all  available  points  would  suffice. 


However,  if  the  object  of  the  experiment  is  to  study  the  appropriateness 
of  tha  linear,  as  well  as  other  models,  there  seems  to  be  no  prior  reason  for 
preferring  any  particular  starting  point  as  an  experimental  tool,  since  It 
might  ba  argued  that  every  F  could  be  associated  with  a  unique  locus.  Such 
considerations  would  lead  to  the  experimental  designs 


Since  the  quarter  apace  is  unbounded,  tha  design  is  arbitrarily  large,  not 
an  encouraging  state  of  affairs. 

If,  on  the  other  hand,  one  might  suppose  that  tha  following  condition  cannot 
reasonably  exist. 


Hypothesis:  Every  point  lies  on  a  unique  locus  of  battle  (supposing  m  and  n 
to  be  true  indices  of  force) . 


This  suggests  tha  following  experimental  design 


But  since  m  and  n  are  Interchangeable  symmetry 
following  design: 


would  be  expected  suggesting  the 
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n 

A  drawback  of  this  design  seems  to  be  that  in  the  case  of  the  linear  law 
or  anything  similar  to  it,  the  design  is  prejudicial  toward  starting 
points  close  to  the  diagonal.  To  correct  this,  we  might  choose  something 
like; 


The  experimental  design  also  requires  consideration  of  the  size  of  tine 
interval  and  number  of  internals  to  be  followed  from  each  origin. 

The  foregoing  seems  to  indicate  the  need  for  considerable  research 
using  war  gaming  models  to  investigate  the  merits  of  various  models 
concepts,  end  develop  a  methodology  for  conducting  studies  using  these 
models . 
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ESTIMATES  OF  P(Y<X)  AND  THEIR  APPLICATION  TO  RELIABILITY  PROBLEMS 
FOR  BOTH  CONTINUOUS  AND  qUANxAii  tuibt'UNSE  uaxa 

J.D.  Church  and  Bernard  Harris 
Mathematics  Research  Center,  U.S.  Army 
The  University  of  Wisconsin 
Madison,  Wisconsin 


1.  Introduction.  In  this  manuscript,  we  provide  a  brief  summary  of  two  papers, 
which  will  be  published  in  more  complete  form  elsewhere. 

We  assume  that  X  and  Y  are  independent  random  variables  with 
cumulative  distribution  functions  F^(x)  and  Gy(y)  respectively.  The 
distribution  of  Y  is  assumed  to  be  known  and  a  random  sample  of  n  observations 
distributed  as  X  is  obtained,  say  X^,  X^. . .,  XR  .  The  objective  is  to 
estimate  P{Y<X}  . 

Two  models  for  this  problem  are  studied.  In  both  models,  F^(x)  is 
assumed  to  be  an  absolutely  continuous  cumulative  distribution  function.  In 
the  case  of  the  first  model,  the  values  of  the  random  variables  X^,  X^, . . . ,  Xn 
are  directly  observed  and  we  refer  to  this  as  the  continuous  model.  In  the  second 
model,  n  real  numbers,  y^,  y2, . . . ,  yfl  ,  are  selected  by  the  experimenter, 
who  then  only  acquires  the  information  X^  <  or  X^  >  Y^  ,  i  =  1,  2, . . . ,  n  , 
from  his  experiment.  The  case  Xi  =  y^  can  be  ignored,  since  this  is  an  event 
of  probability  zero  as  a  consequence  of  the  assumption  that  F^(x)  is  absolutely 
continuous.  This  model  is  referred  to  as  the  quantal  response  model. 

In  both  models  we  will  specifically  assume  that  both  X  and  Y  are 


normally  distributed. 


2.  Physical  Background  of  the  Problem.  Suppose  that  X  is  the  strength  of  a 
component  which  is  subjected  to  a  stress  Y  .  Then,  the  component  fails 
whenever  X  <  Y  and  there  is  no  failure  when  Y  <  X  .  In  addition,  the  stresses 
may  be  expensive  to  sample,  such  as  might  be  the  case  in  missile  flights,  but 
the  physical  characteristics  of  the  missile  system,  such  as  the  propulsive  force, 
angles  of  elevation,  changes  in  atmospheric  condition,  and  so  on,  may  all  have 
known  distributions,  consequently  the  distribution  of  the  stresses  may  be 
calculated.  Therefore,  we  have  assumed  that  G^(y)  is  known.  In  addition, 
when  the  distribution  of  Y  is  known,  this  permits  estimating  P{Y<X}  by 
laboratory  experiments  measuring  the  strengths  of  components  without  recourse  to 
extensive  flight  testing. 

In  view  of  this  physical  model,  P{Y<X }  is  the  probability  that  a  component 
whose  strength  is  distributed  by  F^,(x)  does  not  fail  when  subjected  to  the  random 
stress  Y  distributed  by  Gy(y)  .  Thus,  it  is  natural  to  call  P{Y<X}  the 
reliability  of  the  component  and  we  denote  this  by  R  and  estimators  will  be 
generally  denoted  by  R  . 


3.  A  Summary  of  Results  for  the  Continuous  Model.  When  X  and  Y  are  both 
normally  distributed,  we  can  take  G^(y)  to  be  the  standard  normal  distribution 
®(y)  .  Then, 


(1) 


2 

where  |r  and  <r  are  the  mean  and  variance  of  X  respectively.  Thus  a  natural 
choice  of  estimator  for  R  is 


(2) 


R  = 
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_  1  n  2  ]  n  —  2 
where  X  =  ~  y,  X,  and  s  =  T*  (X.-Xl  .  It  can  bn  shown  that 

“  1=1  1  11-1 1=1  1 
X 

y  - - _  ls  asymptotically  normally  distributed  with  mean  =  ■=£==«. 

1+s  V  ,y1+(r2 

variance 
(3) 


and 


V 


,  2  2 
1  +  JLJL 


l+«r4  \  n  2(n-l){l+(rV 
2. 


Thus  V  is  a  consistent  estimator  of  |x/(l+cr  )  and  by  continuity,  it  follows 
that  R  is  a  consistent  estimator  of  R  .  From  these  remarks,  one  can  readily 
write  down  one  and  two-sided  confidence  intervals  for  R  as  follows: 


(4) 

(5) 


-1 


P{R  >  «(V-«  (l-\)cr  1-  y  , 


P{*(V-«_1(l-^<rv  <  R  <  »  (V-®_1(1-  <rv}~  1-Y  , 


~1> 


*^2  2 
where  cr^.  is  obtained  from  (3)  by  replacing  p,,  or  on  the  right  hand  side  of 

(3)  by  the  estimates  X,  s  .  Since  <rv  -  =  0(n  ),  (4)  and  (5)  are  satisfactory 

approximate  confidence  intervals  for  R  .  The  asymptotic  distribution  of  R  ls 

given  by 


(6) 


P{«(V)<u}~* 


"V 


4.  A  Summary  of  Results  for  the  Quantal  Response  Model.  For  fixed 
yj  <  y2  <  . .  .<  yn  ,  not  all  equal,  the  likelihood  function  is  given  by 


where  u^  =  1  if  Xj  <  yA  and  u  =  0  if  X^  >  y  ,  i  =  1,  2,  . . . ,  n  . 
Analogously  with  (2),  we  propose  to  use 


»  *  2  2 

Here  n  and  <r  are  the  maximum  likelihood  estimates  of  |*  and  <r  determined 


from  (7). 
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Before  stating  a  theorem  on  the  existence  of  proper  maximum  likelihood 

estimates  of  p,  *  ,  that  is  -»  <  p,  <  °°  and  0  <  <rL  <  00  ,  it  is  worthwhile  to 

exhibit  the  type  of  sample  sequences  which  lead  to  improper  estimates  of  |x  and 
2 

tr  .  The  sample  sequence  u^,  i  =  1,  2,  . . . ,  n  is  a  collection  of  n  ordered 

ones  and  zeros,  which  we  denote  by  u  .  If  u  =  (0,  0,  .  . . ,  0)  ,  this  means  that 

no  component  failed,  hence  you  are  lead  to  conclude  that  the  mean  strength  is 

high  (relative  to  y.,  y,,  ...  ,  y  )  and  in  fact,  the  logical  estimator  p.  is  +  80  . 
iu  n  * 

Similarly  if  u  *  (1,  1,  ...  ,  1)  ,  one  is  inclined  to  set  p  =  -00 .  Similarly, 
if  ui  =  (1,  0,  1,  0,  . . .  ,  1,  0)  ,  the  experiment  suggests  strongly  that  the 
probability  of  failure  does  not  change  as  the  y's  change.  In  fact,  it  appears 
to  be  about  f  ,  independent  of  the  y^s  .  This  should  suggest  that  the  variance 
is  very  large  and  +  00  is  the  reasonable  choice.  Finally,  the  sequence 
(0,  0,  ...  ,  0,  l,  . . . ,  1)  suggests  a  very  small  variance,  which  it  is  reasonable 
to  take  to  be  zero.  These  degenerate  cases  do  not  invalidate  the  estimation  of 
R  .  In  fact  all  but  the  last  case  can  be  treated  as  a  binomial  sample,  in  that 
the  probability  of  failure  is  essentially  constant,  that  is,  independent  of 
y  ,  y  ,  ...  ,  y  ,  and  the  usual  binomial  estimates  apply.  In  the  last  case, 

X  has  a  very  sharply  peaked  distribution  with  almost  all  its  mass  located 
between  the  last  zero  and  the  first  one,  and  thus  R  can  be  readily  estimated 
since  Gy(y)  has  been  assumed  known.  From  these  intuitive  considerations, 
the  following  theorem  is  suggested. 

-  -  2 

Theorem  A  necessary  and  sufficient  condition  that  p  and  tr  be  finite  is  that 
the  correlation  between  u  and  y  =  (y^,  y^,  . .  .  ,  yn)  is  positive. 

Intuitively,  this  says  that  the  probability  of  a  failure  should  appear  to 
increase  as  the  y's  increase. 
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*  ^  2 

To  obtain  the  estimates  n  and  cr  t  we  reparametrize  by  setting 
“l  =  0-  and  w2  :=  7  '  Then>  L1  u2)  given  by  replacing  n  and  <r  in 
(7)  by  and  ^  w2  resPeotively  is  a  strictly  concave  function  and  has 

a  unique  maximum  which  can  be  determined  by  any  of  a  variety  of  numerical 
methods'.  Substituting  in  R  gives  the  estimate  for  R  . 
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and 
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Chi-square  tests  are  widely  used  in  testing  statistical  questions  where 
an  actual  computed  variance  can  be  compared  with  an  expected  variance  defined 
on  the  theory  under  test.  Percentage  countB  and  population  counta  fall  in 
this  class;  in  the  simplest  model  of  each  the  expected  variance  is  that  of  the 
binomial  and  Poisson  respectively.  Chi-square  is  thus  useful  for  enumerations, 
but  cannot  be  used  directly  for  measurement  statistics,  with  no  theoretical 
expectation  for  variance.  Enumerations  arise  commonly  in  measuring  biological 
populations,  in  comparing  alternative  forms  of  treatment  or  in  comparing 
frequencies  of  accidents  or  other  chance  occurrences. 

Where  the  actual  internal  variance  on  the  model  chosen  exceeds  the 
theoretical  variance,  chi-square  tests  may  show  a  significance  not  borne  out 
by  repeated  work.  Caution  must  thus  be  used.  Snedacor6  (1956)  shows  procedure 
in  comparing  population  counts;  of  making  sure  the  main  experimental  treatments 
do  not  show  significant  internal  chi-squares  between  subsamples  (thus  are 
"homogeneous") .  The  main  treatments  can  thsn  be  compared  by  chi-square.  A 
quick  chi-square  teat  will  often  be  time-saving;  if  populations  do  not  show 
differences  by  chi-square,  they  will  not  show  significant  differences  by  any 
test. 

While  various  forms  of  comparison  of  variances  are  carried  out  as  chi- 
square  testa,  the  most  frequent  form  is  a  somewhat  approximate  tost  of  frequency 
distributions.  An  actual  dlatribution  of  numbers  In  asverel  classes  is  compared 
with  counts  expected  by  some  theory;  or  two  or  more  sets  of  classes  ere  compared 
to  see  if  they  differ,  with  their  average  serving  as  the  theoretical  distribution. 
The  ci'i-square  is  defined  as  the  sum  of  ration  (0-C)2/C,  where  C  is  the  calculated 
and  0  the  observed  number  in  each  dees.  The  teat  is  related1  to  Poisson 
expectation. 

Holt^  et  al  (1967)  have  recently  published  an  article  on  numbers  required 
for  chi-square  tests  in  forest  insect  work.  They  show  formulae  for  estimating 
the  numbers  which  would  bring  a  non-significant  chi-square  to  significance  if 
the  difference  already  found  held  up  in  further  sampling.  Examples  used  include 
a  comparison  of  actual  frequencies  to  a  negative  binomial,  and  a  2  X  2  comparison 
of  sex  differences  in  response  to  two  attractants.  Both  are  on  insect  data,  and 
both  are  suited  to  use  of  "two-tail"  probabilities.  "One-tail"  odds  are  adapted 
where  the  interest  is  only  in  a  difference  in  one  direction. 
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It  would  seem  of  more  value  to  attempt  to  define  numbers  to  detect  a  real 
difference  of  a  given  size.  If  present,  than  to  define  numbers  which  would  make 
a  certain  observed  difference  significant.  With  greatly  enlarged  numbers 
almost  any  trifling  difference  will  test  as  significant.  A  useful  aim  is  to 
define  numbers  which  will  practically  ensure  detection  of  any  Important  difference, 
but  will  not  waste  effort  in  detecting  unimportant  differences.  The  level  of 
difference  to  be  regarded  as  "unimportant"  will  depend  on  the  experimental 
problem. 

2 

A  set  of  counts  (Fleming  &  Baker  ,  1936)  of  Japanese  beetle  larvae  on 
individual  square  feet  is  available  for  study.  Two  areas  of  375  square  feet 
each  were  defined,  which  seemed  fairly  homogeneous.  The  total  count  for  each 
area  yielded  averages  of  almost  exactly  3  and  5  larvae  per  square  foot 
respectively. 

Random  samples  of  ten  individual  square  feet  were  taken  in  each  area. 

Sample  means  wers  2.9  and  4.9,  with  variances  of  4  to  6.  Chi-square  between 
the  two  totals  was  about  5,  definitely  significant  at  5%.  In  the  Poisson  the 
variance  equals  the  mean.  Hence,  in  the  plot  with  a  population  mean  infestation 
of  3  larvae  par  square  foot,  the  variance  will  also  be  3  in  the  absence  of 
additional  factors  inflating  the  variance,  and  the  5  per  square  foot  plots  will 
have  a  variance  of  5 .  Use  of  these  theoretical  Poisson  errors  In  i  t  test 
yielded  a  t  of  about  2.2,  corresponding  well  to  the  equivalent  chi-square  test. 

In  this  case  10  units  of  each  were  sufficient  to  show  the  observed  difference 
to  be  significant.  This  observed  difference  is  very  close  to  the  true  difference 
in  this  case. 

In  further  study,  20  samples  of  7  units  each  were  taken  from  each  population. 
Each  sample  was  later  expanded  to  10,  next  to  12  and  than  to  26  units.  Chi- 
squares  ware  calculated  between  the  20  pairs  of  samples)  in  each  case. 

TABLE  I 

RESULTS  OF  TESTS 


No.  of  Units 
ner  Ssmole 

" 

Number  of  Chi-squares 

Significant 

Non-significant 

Marginal 

7 

11 

8 

1 

10 

14 

5 

1 

12 

16 

4 

0 

26 

19 

1 

0 

The  7-unlt  samples  are  evidently  near  the  level  at  which  half  will  be 
significant.  With  7  pairs  of  units,  one  from  the  3  per  square  foot  population, 
one  from  the  5,  the  expected  total  will  be  21  +  35  -  56  larvae.  Using  the 
expected  Polssou  variance.,  the  variance  of  the  difference  of  the  sample  totals 
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will  also  be  approximately  56.  On  an  individual  square  foot  basis  this  becomes 
8;  on  a  basis  of  means  of  7  units,  1.14  (Enedecor,  section  3.6).  Actually, 

8  square  foot  counts  per  sample  meet  the  half-  significance  level  in  theory. 

As  the  observed  results  of  Table  1  show,  the  number  of  significant  chi- 
squares  varies  as  the  number  of  square  foot  units  per  sample  does.  If  we  wish 
to  be  quite  sure  to  detect  a  difference  as  large  as  2  larvae  per  square  foot 
in  the  rate  of  infestation  it  will  be  necessary  to  sample  a  large  number  of 
units.  The  situation  is  diagramed  below. 


o  x  d 

In  a  given  but  very  large  sample  from  two  populations  whose  two  counts 
differ  by  d  units,  the  observed  difference,  x,  will  be  held  close  to  the  true 
value,  d,  with  high  probability.  At  the  same  time,  a  test  of  significance 
evaluates  the  departure  of  x  from  2ero,  also  with  high  probability  since  the 
sample  is  large.  Reversing  the  argument,  we  may  choose  the  significance  teat 
probability  level  and  also  the  probability  level  that  x  will  be  sufficiently 
large  to  show  significant  departure  from  aero  and  then  calculate  how  large 
the  sample  must  be.  There  is  only  one  chance  process  involved—not  two,  one 
with  a  true  differei.ee  d  and  one  with  zero  difference.  It  is  postulated  that 
a  true  difference  of  d  exists,  so  this  is  the  one  real  chance  process  that 
yields  the  observed  value  of  x.  The  role  of  the  test  of  significance  for  zero 
difference  is  solely  to  determine  by  how  large  an  interval  the  observed  chance 
variable  x  can  fall  short  of  the  population  mean  d  and  yet  yield  a  decision  of 
significance  against  the  null  hypothesis.  Thus,  in  the  actual  experiment,  that 
value  of  sample  size  N  ia  sought  which  ensures  that  the  observed  difference 
will  fall  in  the  interval  from  d-x  to  infinity  with  the  chosen  probability. 

The  distribution  of  the  difference  of  two  Poisson  variables  when  a  true 
difference  exits  is  complex. But  in  most  cases  the  normal  approximation 
will  be  entirely  adequate*  and  here  the  existence  of  a  true  difference  does  not 
disturb  the  normality  of  the  distribution.  If,  for  example,  a  955!  confidence 
level  is  chosen  that,  if  a  true  difference  of  d  units  exists,  the  null  hypothesis 
will  be  rejected,  then  the  value  of  N  is  so  chosen  that  in  only  5%  of  the  trials, 
the  observed  difference  will  fall  to  the  left  of  x  in  the  diagram.  The  test 
is  hence  a  one-tailed  test.  The  usual  test  becomes 


where  x^  and  x^  are  the  means  of  the  samples  from  the  two  areas  in  the  field 

being  compared,  d  ■  2  is  the  population  difference  we  wish  to  detect.  The 
value  of  x  is  set  by  the  null  test  significance  level,  as  follows.  If  the  usual 
5%  level  is  chosen,  then 


353 


will  be  the  t  calculated  on  the  null  hypothesis,  x  is  the  observed  difference 
in  sample  means . 

- - v  Hence  x  ■  t„  S. 

•  2d 


From  the  diagram  it  is  seen  that  from  (1)  and  (2) 

d  -  x  +  x  -  tx  Sd  +  t2  Sd 

or  d  ■  (tj,  +  t2)Sd 


(3) 


The  null  hypothesis  test  of  significance  will  be  referred  to  standard  t  tables. 
At  the  5%  two-tailed  level  of  significance  and  infinite  degrees  of  freedom 
Cqj  ■  1.96.  But  the  t  tables  are  two-tailed,  whereas  what  is  wanted  is  a 

one-tailed  level  of  performance  (only  too  small  an  observed  x  will  be  judged 
nonsignificant).  Since  the  t  distribution  is  symmetric,  the  value  taken  from 
the  tablea  at  the  103  level  la  the  5%  level  for  a  one-tailed  test.  This  value 
is  1,64.  Substituting  these  values  in  equation  (3)  gives 

2  •  (1.96  +  1.64)  Sd 

or  Sd  -  2  ♦  3.6  -  0.56  (4) 


2 

This  laada  to  a  needed  variance  of  (0.56)  ■  0.31  to  be  fairly  sure  (953)  of 

detecting  a  difference  at  a  53  significance  level. 

The  discussion  above  is  an  elaboration  of  the  argument  of  Cochran'*' 
(Appendix  to  Ladell,  1938).  This  treatment  is  reproduced  in  Cochran  and  Cox 
for  both  one-tail  and  two-tall  teats  and  with  80,  90,  and  953  confidence  that 
the  hypothesized  true  par  cent  treatment  difference  will  yield  a  significant 
test  against  the  null  hypothesis. 


It  remains  to  apply  this  result  to  our  data  to  determine  the  size  of  the 
sample  needed  to  discriminate  between  two  areas,  one  Infested  at  a  rate  of  3 
larvae  per  square  foot  and  one  at  s  rate  of  5  larvae  per  square  foot.  If 
equal  numbers  of  one  square  foot  units,  sampled  from  each  of  the  areas  to  be 
compared  for  level  of  infestation,  are  examined,  then 'the  theoretical  variance 
of  the  difference  between  the  means  is: 


N 


(Snedacor,  section  7.10) 


or 


M1  *  M2  3+5  8 

V(d)  “  0.31  "  0.31 


26 


(5) 


That  1b ,  if  we  wish  to  discriminate  between  two  fields  where  one  has  an 
Infestation  rate  of  3  larvae  per  square  foot  and  the  other  5  larvae  per  square 
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foot,  then  26  square  feet  will  have  to  be  examined  for  each.  Of  course,  the 
usual  requirements  for  a  random  selection  of  the  26  units  from  the  total  area 
of  each  field  will  have  to  be  observed. 

Practical  workers  will  often  have  a  good  idea  of  the  level  of  infestation 
economically  acceptable,  and  how  much  larger  the  infestation  must  be  to  make 
treatment  application  worthwhile.  Hence,  the  above  approach  should  often  prove 
useful. 

If,  however,  the  infestation  rate  is  taken  as  derived  solely  on  the  basis 
of  a  preliminary  trial,  the  formula  can  be  modified  to  employ  the  estimated 
variance  from  the  preliminary  trial.  This  might  well  be  indicated  if  the 
Investigator  felt  doubtful  about  accepting  the  applicability  of  the  Poisson 
variance  in  his  work.  Suppose  a  sample  of  7  square  foot  units  distributed  at 
random  over  each  of  the  two  areas  is  taken  and  that  (the  Poisson  distribution 
being  applicable,  as  proved  true  in  Fleming  and  Baker's  data)  the  observed 
variance  of  the  difference  turned  out  to  be  1.14.  What  is  desired  Is  the  else 
of  trial  needed  to  discriminate  a  difference  of  2  larvae  per  square  foot  in  the 
infestation  rate  of  the  two  areas.  A  basic  formula  la  that  the  variance  of  a 
mean  is  Inversely  proportional  to  the  number  of  units  in  that  mean  (Snedecor, 
section  3.6) 


HjVGEj)  -  N2V(x2)  -  Constant,  (6) 

where  x^  is  the  mean  of  units  and  x2  Is  the  mean  of  N2  units  from  the  same 
basic  population. 

Hence  N2  -  7(1.14)  ♦  (0.31)  -  26.  (7) 

The  answer  in  (7)  is  the  same  as  that  in  (5)  since,  the  Poisson  formula  applies 
in  practice.  The  approach  represented  by  formula  (6)  is  general,  however. 

It  is  only  because  the  t-teat  applies  both  to  equation  (1)  and  aquation 
(2)  that  it  could  be  used  for  both,  In  the  general  case  (for  non-normal  base 
distributions)  equation  (2)  would  be  referred  to  the  non-central  form  of  the 
applicable  distribution. 

Further,  in  the  example  we  used,  we  only  wanted  a  one-sided  test,  since 
we  specified  which  population  would  be  larger,  if  one  were.  If  instead  we 
had  only  wished  to  determine  whether  the  fields  could  be  considered  equally 
infested,  and  decided  that  such  a  conclusion  would  be  acceptable  unless  the 
infestation  differed  by  at  least  two  larvae  per  square  foot,  the  t-probability 
would  be  used  for  5%  significance  to  accommodate  a  difference  in  either 
direction.  Actually  of  course  a  reference  to  the  normal  table  might  well  be 
more  convenient,  where  the  one-tailed  probability  for  2.5%  would  be  used  to 
obtain  95%  confidence  thet  a  true  difference  of  2  larvae  would  be  detected, 
where  either  field  could  be  the  more  heavily  infested. 

The  theory  is  verified  in  Table  I  by  the  outcome  with  26  units  per  sample; 
the  95%  expectation  is  met  more  exactly  than  usual.  If  a  smaller  difference 
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Is  Co  be  detected,  a  larger  number  per  sample  needed  can  be  defined  as  Is  done 
above ;  o  larger  JllT  thence  can  be  detected  by  smaiier  samples.  The  Olilerence 
to  be  detected  rat>-,i.  be  equated  to  3.60S^  (with  the  selected  probability  levels). 

The  general  l'v.oi  of  population  per  unit  must  be  approximately  known  or  some 
preliminary  Information  is  needed,  perhaps  from  partial  samples  because,  in 
the  Poisson,  the  variance  is  equal  to  the  mean. 

The  procedure  is  then  to  define  approximately  the  level  of  population 
and  theoretical  variance  from  preliminary  samples.  The  difference  to  be  detected 
must  be  specified.  This  difference  is  then  equated  to  3.6  times  the  needed 
standard  error.  With  the  variance  of  the  mean  thus  defined,  numbers  can  be 
specified  to  yield  this  variance,  and  to  give  high  confidence  of  detection  of 
the  specified  difference.  Numbers  are  over  three  times  as  great  as  are  required 
for  50%  confidence  of  detection,  and  still  higher  for  higher  confidence  levels. 

In  addition  to  the  comparison  of  two  Polsson'count^  discussed  in  detail 
above,  count  data  arises  in  the  comparison  of  percentages  and  Jin  contingency 
tables.  Paulson  and  Wallace**  (1947)  treat  the  case  of  choosing  a  sample  size 
for  the  comparison  of  two  percentages.  Przyborowski  and  Wllenski?  show  that 
when  the  percentage  of  successes  in  each  of  two  series  is  small  compared  to  the 
number  of  observations  in  each  series,  the  successes  alone  can  be  analyzed  as 
if  they  were  observations  on  a  binomial  variate,  and  that  the  latter  provides 
an  exact  test  for  the  ratio  of  parameters  from  two  independent  Poisson  trials. 
Hoel?  (1945),  however,  showed  that  an  exact  test  should  rarely  be  needed.  The 
power  function  for  2X2  tables  was  discussed  by  Pearson  and  Merrington.-*-1 
Clerk  and  Downie12  (1966)  provided  charts  for  determining  sample  sizes  for 
discriminating  two  proportions  at  the  SO,  80,  and  95%  probability  level,  Bennett 
and  HaulO  (1960)  give  the  power  function  for  the  exact  teBt  for  the  2X2  table. 
Halperin,  Rogot,  Gurlan  and  Ederer^  have  prepared  tables  from  which  sample 
sizes  required  for  comparative  trialB  of  two  forms  of  repeated  treatments  can 
be  determined.  While  intended  for  medical  trials,  it  is  possible  that  their 
results  could  be  applied  to  a  situation  involving,  say,  periodic  maintenance, 
or  to  perennial  crop  culture  or  to  a  regimen  for  the  maintenance  of  field 
fertility. 
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ABSTRACT.  A  rule  for  selecting  runs  in  a  sensitivity  tBst  and  a 
method  for  analysing  the  experimental  results  from  those  runs  are 
presented  In  this  paper.  The  analysis  techniques  provide  confidence 
bounds  and  limits. 

INTRODUCTION.  A  binomial  (or  Bernoulli)  experiment  is  one  involving 
the  observations  of  a  random  variable  which  represents  the  outcomes  of 
experimental  runs.  The  outcome  of  a  run  has  only  two  attributes  which 
are  referred  to  as  j|o  and  no-go  or  as  success  and  failure.  A  class  of 
binomial  experimental  referred  to  as  sensitivity  tests ,  may  be  described 
as  having  the  following  properties. 

1.  The  experiment  involves  many  specimens  which  ere  identical  insofar 
as  the  experimenter  can  distinguish. 

2.  During  the  experiment,  each  specimen  is  subjected  to  a  stimulus 
which  is  controlled  and  measurable.-  This  is  a  run  of  the  experiment. 

3.  After  the  specimen  has  been  subjected  to  the  known  stimulus,  it  la 
observed  to  be  in  oua  or  the  other  of  the  two  possible  outcome  states. 

4.  No  specimen  is  subjected  to  a  stimulus  more  than  once. 


The  remainder  of  this  article  was  reproduced  photographically  from  the 
author's  copy. 


The  object  of  a  sensitivity  test  is  to  determine  the  proba¬ 
bility,  as  a  function  of  the  applied  stimulus,  that  a  specimen  will 
bo  in  one  of  the  two  outcome  states.  We  emphasize  that,  in  a  sen¬ 
sitivity  test,  a  known  amount  of  stimulus  is  applied,  and  then  the 
outcome  is  observed.  There  is  another  class  of  experiments,  called 
tests  to  destruction,  which  may  be  related  to  and  arc  often  con- 
fusod  with  sensitivity  tests.  In  tests  to  destruction,  the  applied 
stimulus  is  measured  that  causes  the  specimen  to  reach  a  prescribed 
state.  This  is  basically  a  different  type  of  experiment  from  a  sen¬ 
sitivity  test  even  though  the  prescribed  state  of  the  specimen  may 
be  labeled  Bucoess  and  all  other  states  labeled  failure.  Thus,  the 
outcome  state  is  known  for  each  run  and  the  random  variable  is  the 
amount  of  stimulus  required  to  reach  that  state.  Tests  to  destruction 
are  not  binomial  experiments  since  the  observed  random  variable  in 
suoh  experiments  usually  possesses  a  continuum  of  values.  In  a  test 
to  destruction,  one  may  ?oek  the  density  function  £(t)  for  the  random 
variable  T  (the  stimulus);  and,  if  there  is  an  equivalent  sensitivity 
test,  in  which  one  obtains  on  estimate  for  the  probability  of  success 
p(T)  as  a  function  of  the  amount  of  stimulus,  then  the  following  re¬ 
lation  exists 

p(t)  -  f(t)dt . 
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I 

I 

However,  given  only  experimental  data  for  p(t)  acquired  from  a  sen¬ 
sitivity  test,  no  confidence  can  be  stated  for  relating  a  density 
function  to  the  parameter  t  by  trying  to  consider  it  to  be  the  random 
variable  of  a  test  to  destruction.  The  reason  for  mentioning  tests 
to  destruction  is  to  warn  the  experimenter  that  analysis  techniques 
developed  for  them,  those  which  assume  a  distribution  for  the  stim¬ 
ulus,  are  not  directly  applicable  to  sensitivity  tests. 

Often  cn  experiment  can  only  be  performed  as  a  sensitivity  test; 
and  even  when  it  can  be  performed  either  as  a  sensitivity  test  or  as 
a  test  to  destruction,  sensitivity  testing  is  chosen  for  reasons  of 
economy  and  expediency ,  For  this  reason,  this  paper  is  devoted  to 
developing  a  rule  for  selecting  runs  in  a  sensitivity  test  and  to  an 
associated  method  of  analysis. 

Definitions  and  Notation.  In  a  sensitivity  test,  an  amount  of  stim¬ 
ulus  is  applied  to  the  k-th  specimen  and  the  outcome  ^ 

Tj( r  )  is  observed  whore  T)(t  )  can  assume  either  the  value  1,  referred 
k  k 

to  as  go,  or  the  value  0,  referred  to  as  no-go .  More  specifically, 
we  assume  that  there  is  a  sequence  of  I  values  of  stimulus  such  that 

To  <  T»  <  T,  <  '••  <  Tj.j 

For  each  value  0  5  i  5  1-1,  N^  runs  are  performed,  and 

1-1 

N  =  2  N  (1) 

i=0  1 
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is  the  total  number  of  runs  for  the  sensitivity  test.  Thus 


TWT  0  for  no"S° 

T  l  1  for  go 


is  the  outcome  of  the  j-th  run,  1  ^  j  s  for  T='I';L«  The  number  of 
go's,  n^  (t^),  in  the  runs  for  is  given  by 

Ni 

nN(Ti)=  2  yr  ).  (3) 

i  j=l  J 

The  probability  of  go  for  t=t^  Is  designated  by  p(t^),  0  ^  p(T^)  s  1; 
and  the  expected  number  of  go's  for  t=t^  given  runs  is 

eCd^^)]  =  '4) 


An  estimate  of  the 


°JL  fi£;  PN  (T1),  for  p(Ti)  is  given  by 


^(T.)  .  ^ 

(h)--H--r  E  VTi>- 

«i  a  *i  “i  j=l  J 


The  standard  deviation  for  the  number  of  go's  n  (r  )  when  performing 

1  1 


runs  is 


aCn^Ti)]  =  V  Nip(T1)Ll-p(Ti; 


and  is  estimated  by 


&[nNl(Ti)]  =  ^n/V^A/V-1  .  (7) 


An  estimate  for  the  accuracy  of  p  (t  )  is  obtained  by  knowing 

”i  1 

that  the  standardized  variable 
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\(ti)  -  yty 

<’cN1{Ti>] 

is  asymptotically  normal  for  large  N.  when  nM  (t  )  is  a  binomial 

l  ri^  i 

random  variable,  Thus,  the  cumulative  probability  P  Is  approximated 


{  -  k  *  — - - 5  k  U  _L.  fk 

\  1 


-i-fk  e-TV2dT 

vS??  J'"k 


” { I  p<Ti>  -  VT‘> 1  s  *7 o[’\(,‘)J )  ’  7=  J--*  ”'TVad,i  <s 5 


ss-  o[n  (t  )]  is  the  confidence  bound  for  the  absolute  difference  of 
Wi  1 

p(r. )  and  p„  (r  );  and  the  right  side  of  Eq.  (8)  is  the  confidence 
X  N  ^  3.  r 

(limit)  with  which  the  bound  holds.  Since  there  is  no  way  of  deter¬ 
mining  o[n  (t  )]  from  experimental  data,  Eq,  (8)  is  approximated  by 
N1  1 

f  { l  p<v  -  yv  I  =  ^  •cy,)] }  -  7=  £  •',V2‘1T  <•) 


where  the  notation  P  implies  that  an  estimated  confidence  bound  has 
replaced  the  confidence  bound  in  P.  The  value  of  k  is  determined  by 

A 

netting  P  equal  to  the  desired  level  of  confidence,  say  0,95,  and  solv- 


-i-J*  e'TVadT  -  P 

r™  "k 
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for  k.  Usually,  when  a[n  (t  )]  is  used  in  place  of  a[n  (t  )], 

W.t  1  ”i  1 

the  Student  T  distribution  is  considered  more  appropriate  for  the 

right  side  of  Eq,  (9),  and  the  confidence  bound  is  enlarged  to  ac- 

* 

comodate  for  the  uncertainty  involved  from  not  knowing  a.  However, 
for  purposes  of  presentation,  we  use  the  normal  distribution  approxi¬ 
mation  for  the  asymptotic  property  of  the  binomial  random  variable. 

Using  the  probability  of  go  p(t),  the  average  of  the  probability 
of  go  over  a  2T  interval  of  t,  centered  at  t=t  ,  is  defined  as 

p^t*)  =  p(T)dT-  (l°) 


/  1  ► 

Let  N  experimental  runs  be  performed  at  each  value 

Ti+m  “  Ti  +  **  (11) 

where  m  *>  -  M, .. .,-1,0,1, .. .M  and  At  =  T/M,  then  an  estimate  for  the 
average  probability  is  given  by 


pM,N^Ti^  °  2M+1  ^  pN^Ti+m^  "  „  N7 

1  ttlss-M  mss-W  Jerl 


(12) 


In  the  particular  case  when  N  »  1, 


m=-M' 


(13) 


where  T|(t  )  =  T)  (t  ).  An  equation  similar  to  Eq,  (9)  can  be  de- 
i+m  1  14  m 

rived  for  the  confidence  and  confidence  bound  for  the  absolute  dif- 

h 

ference  of  P.j»(Tj_)  and  Dcfine 
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(14) 


M 


ma-M 


then 


aw  =  J { 2Mfl)pM  <T1)C1-fH(T1)3; 


(15) 


and 

p  {  I  I 


(16} 


Thus,  In  a  sensitivity  test  one  performs  runs  at  various  levels 

of  the  stimulus  T  and  observes  the  outcome  T)(t),  as  defined  in  Eq.  (2), 

for  each  run.  From  the  outcomes,  one  calculates  statistics  such  as 

n„  (tj  )  of  Eq.  (3)  or  n,.(T  )  of  Eq.  (14)  and  then  makes  estimates  for 
N.  X  Ml. 

*■  A 

the  probability  of  go  pR  (i^)  using  Eq,  (5)  or  the  average  probability 
of  go  p„(T  )  using  Eq.  (13).  Finally,  one  makes  estimates  of  the  error 

m  1 

involved  in  these  probabilities  using  Eq ,  (9)  or  ( 16 ) .  When  one  fol¬ 
lows  the  procedure  that  uses  Eqs .  (3),  (5),  and  (9),  he  is  following 
an  experimental  design  referred  to  as  the  Probit  method.  In  following 

4% 

this  method,  when  evaluating  p„  (t  )  for 

fl.  i 


T1  a  T0  +■  iM,  0  S  i  *  1-1, 

and  when  using  N  =  2M+1  for  all  i,  the  total  number  of  runs  is  (2M+l)l. 
We  present  a  method  that  uses  Eqs.  (13),  ( 14)  and  (16)  which  requires 
2MfI  runs  at 

*  T0  +  i^T,  -  Ms  i  s  1+M-l , 
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A  Seloct ion  Rule .  In  every  sensitivity  test,  the  experimenter  pos¬ 


sesses  either  explicit  or  implicit  information  about  the  upper  and 
lower  values,  T  and  t  ,  of  the  stimulus  he  will  use  in  the  expert- 
ment  and  about  the  spacing  At  between  values  of  T  at  which  he  de¬ 
sires  estimates  for  the  probability  of  go  p(r).  The  values  of  Ty, 
and  At  are  determined  by  previous  knowledge  about  the  specimen, 
by  the  total  amount  of  time  that  can  be  devoted  to  the  experiment, 
by  the  experimental  equipment  or  some  combination  of  these.  For 
example,  At  may  be  determined  by  the  ability  to  measure  the  stimulus. 

is  always  greater  than  or  equal  to  zero,  and  t^  might  be  deter¬ 
mined  through  the  rationale  that  if  it  exceeds  a  prespecified  amount 
then  the  specimen  Is  no  longer  of  experimental  Interest.  We  assume 
then  that  the  values  of  T^,  t^  and  At  are  explicitly  specified.  In 

A 

addition,  the  experimenter  must  specify  the  confidence  P  and  the  cor.- 
fiedence  bound 

b  ■  rt 

for  |p(Ti;  -  ph  (t^I  or 

5  “  5Ei  'ot"K<Ti>3  <”> 

A 

for  |p(t^)  -  Pjkj(Ti)|  *  Although  the  experimenter  sometimes  has  addi¬ 
tional  Information  that  greatly  simplifies  the  selection  of  experi¬ 
mental  runs,  he  is  often  faced  with  only  the  Information  that  0  s 
p(T)  s  1  for  t  in  an  interval  of  length  d(t)  Ty  -  T^  and  that  this 
Interval  lies  in  tT^,Ty],  The  exact  length  of  D(t)  and  where  it  is 
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and  since 
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Ti^6 )  =  TL  +  ^Tot  i  -  0,1,..,,  II  «  N^°)-l 

Let  the  outcomes  of  these  runs  he  designated  by  11(t^0)).  If  these 
outcomes  do  not  include  ^1(t0^0))  =  ,0  and  ^(T^i0))  a  1,  the  experi¬ 
menter  should  reconsider  his  experiment.  Therefore,  assume 
«*  0  and  TKTj^0))  =  1  in  the  remainder  of  this  discussion.  If  ten 
or  more  sequential  outcomes  are  identical  and  one  of  these  is  at  the 
end  value  T0(°)  or  then  remove  that  end  value  from  further 

consideration.  Next,  choose  values  of  T  that  are  midway  between  the 
consecutive  values  of  T  and  perform  runs  at  these  values  of  T, 
Now,  we  possess  outcomes  for  values  of  for 

r^(1 )  m  T0(1)  +  IATj,  i  a  0,1,2,..,,  Ng(l'-1 
At, 


where 


'»  »  At0/2, 


and 


was 


Ng(0  «  {  2i  }  lf  {  either1"  }  v*luo  °*  WQS  d±flcardad* 


i 


Again,  cheek  to  see  if  there  are  ten  or  more  sequential  T)( )  that 
are  alike  where  one  of  these  T1(t± )  is  either  11(To^)  or  Tl(T^i)  ^)« 

If  there  are,  remove  from  further  consideration  all  but  the  nine  iden¬ 
tical  T)( C 1 ) )  which  are  next  to  an  opposite  outcome.  Next,  choose 
new  values  of  t  that  are  midway  between  the  consecutive  values  of  the 
T  (*)  that  were  aaved  and  make  runs  at  these  values  of  T.  Now,  we  pos 
sess  outcomes  at 
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V 


1-  (*)  =  T0(2)  +  IAt,,  1=0,1,  .  ,.,N  (*)-l 

*  b 


£ll  _  & 

2  *  2“  * 


Chock  to  see  If  there  are  ten  or  more  sequential  T)( ^ a  ^ )  that  are 

Identical  and  remove  from  further  consideration  all  but  the  nine 

which  are  next  to  an  opposite  outcome,  This  procedure  of  setting  At  a 

t 

At^_^/2,  performing  additional  runs  and  casting  out  all  but  nine  of 
the  Identical  sequential  values  of  TI(t^)  is  continued  until 

At  s  At 

X 

if  one  is  planning  to  use  the  Probit  method  of  analysis.  For  the 
method  of  analysis  described  in  the  next  section  the  stopping  rule 
is  somewhat  more  complex  and  is  given  in  that  section, 


A  Method  of  Analysis.  This  method  of  analysis  is  designed  primarily 

A  ^ 

to  aoqulra  estimates  p^(T  )  for  the  average  probability  of  go  Pt(t1) 
and  secondarily  for  an  approximation  p(t)  for  the  probability  of  go 

A  ^ 

p{t).  From  the  desired  confidence  P  and  confidence  bound  b,  the  value 
of  M  is  determined  by  first  chocsing  k  such  that 

J*  .-TVi,dT 

and,  then,  choosing  M  to  be  the  smallest  Integer  greater  than  or  equal 


to  M  where 


k 

isn+r 


8(b)*  2 
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Assume  now  that  for  the  specified  values  of  J,  T  ,  t  and  At, 

L  U 

that  the  nin  selection  rule  has  provided  a  set  of  outcomes  T|('r^ ) 
where 

T  a  T0  +  iAt,  -  M  ^  i  s  I  +  M-l , 

1*1,  and  T_^  =  Tc(t)  from  the  solection  rule.  Using  theso  values 

A 

of  n(Tl),  )  is  calculated  b>  Kq .  (13)  for  each  i,  0  £  i  ^  1-1. 

The  stopping  rule  for  the  selection  rule  when  obtaining  esti¬ 


mates  p,  ( T.  )  for  the  nverago  probability  of  go  p  (t  )  always  con- 
Mi  ii 

siders  the  requirement  fur  2M+I  outcomes  T|(T^)  for  oquully  spaced, 
sequential  values  of  T  with  separation  At.  Thus,  if  for  any  integer 

At 

t  where  At  =  -r8-  >  At,  one  can  satisfy  the  two  conditions: 

2* 


and 


1. 


2. 


At  *  At  and  At  „  >  At 

t+p  t+p-1 


(t+p) 


h  2M+I  and 


£  (t+p-1)  < 


2M+X 


for  some  positive  integer  £  whore 


(t+p) 


2PN, 


(t) 


2  +  1. 


then  ono  no  longer  removes  identical  outcomes  from  further  considera 
tion  but  performs  runs  at  those  values 


(t+p) 


(t) 


iATt+P' 


1=0,1, 


(t+p)_x 


for  which  runs  havo  not  already  boon  performed .  If  theso  conditions 
arc  not  satisfied  then  the  procedure  is  terminated  in  the  same  manner 
as  for  the  Probit  method, 
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p(t), 


From  the  I  values  of  Pm(t4),  a  piece-wise  linear  approximation 


0 

T  -  T 
i 


for  0  <  T  <  t 

for  T  ^  T  £  T 

-  # 

for  T  >  t  , 


is  calculated  for  p(t).  This  linear  approximation  is  acquired  by  a 

least  square  fit  of  p,.(t  ), 

H  1 

m»-M 

A 

to  PM(Ti)*  The  least  square  procedure  is  to  choose  the  value  of  i, 
say  i=J,  where  pu(t  )  i  ~  and  set  t  and  t*bTj  .  Then 

Ml  2  J  “C  J+C 


for  i  <  J-c 


T  -  T 
1 _ i-c 


p  (T  )  b  <  - 7-^ —  for  J-o  *  i  <  J+c 

J  J+c  J-o 

I  1  for  i  >  j+c 


and  p  (t  )  ia  calculated  as 

W,  J  1 


*  ~ 


p  (t  )  m  -  —  S  p  (t  ) 
PM,j  3M+1  „  *JV  1+nr 


,  0  «  i  S  1-1, 


Next|  calculate 


«(°)  B  p  (t  j  -  p  (t  ) 

J,i  i'  PM, j'1 i ' 


for  the  admissible  values  of  i,  and  then  calculate 


s<c>  -  lll  C  «<°>  ]*. 
J  1-0 


By  performing  theae  calculations  for  c  ■  1,2,3,,,,,  a  sequence 
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is  generated,  and  the  desired  p  (t)  curve  is  acquired  from  the  choice 


Ctt  of  c  which  minimizes  sjj0'1,  i.e., 

J 


,(c*) 


Min{s^]. 


The  preceding  calculations  arc  now  repeated  by  replacing  j  by  j+p 

(cW) 

to  obtain  a  sequence  of  minimized  Sv  P'  for  p  =  0,il ,±2, . . . ,  i.e,, 

j+P 

g(c*2)  Act,)  J^)  (c t)  .(<$) 

•'•'V*  ’  V»  *  J  '  V>  ' "■*  ' 


Tho  optimum  solution  for  p(t)  is  that  Pj+p(T)  for  which  p  minimizes 
the  preceding  sequence , 

Statistical  Considerations,  the  statement  of  Eq.  (16)  is  based  on 

Ithe  assumptions  that  Pt(t1)  is  the  probability  of  a  binomial  random 
variable  givon  that  p(t)  is  the  probability  of  a  binomial  random  vari- 

A 

able  and  that  p  M(ri)  is  a  consistent  estimator  of  ryjT^),  To  show  that 
Pt(t^)  is  +he  probability  of  a  binomial  random  variable  wo  neod  only 


show  that 


where  q(t)  =  l-p(t).  Since 


then 


-w  -  >  -  j?  »(«)« -hr/i  «*  -  hm  p(*>« 


2T 


Ti-T 


Tj-T 


t4-T 
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=  ^  JTi+T  Ll~p(t ) jdt  =  —  J,Ti+T  q(t)dt  =  q  (t  ). 
*  Ti-T  T  -T  T  1 


To  show  that  PM(T^)  ls  a  consistent  estimator  of  PT(Ti)  we  show 
that  pM  N(T1)>  as  defined  by  Eq .  (12)  with  N  replacing  N*,  is  a  con¬ 
sistent  estimator  of 


1 

2M+1 


M 

s 

m=-M 


p(t 


i+m 


)-^;vT  w 


Tj-T 


PT(Ti 


) 


and  by  assuming  that 


a.  p(t)  is  continuous  in  T 
o.  p(t*)  ^  p(l/)  for  all  T*  ^  "x‘ 

c.  T  s  o  is  finite  and  =  T/M 

d.  p(o)  *  0  and  there  exists  a  t^O  <  <  03 

such  that  p(Ty)  =  1* 

The  expected  value  p,.  (t  )  satisfies 

M ,  Pi  1 


for  fixed  T.  Furthermore,  for  M=0 


Lim 

N-*°° 


E[P0,N(Ti)]s  P<Ti^ 


and  for  V:rl, 


Lim 

M-® 


^[PM(Ti)]  -  pt(t4). 


Thus, 


to  sho.v  consistency,  we  must  st.ill  show  that  the  variance  oi‘ 

T  /  can  be  made  sufficiently  small  by  appropriate  choices  of  M 
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and  N.  To  simplify  notation,  define 


A .  u  £  71  (t  ) 

j  m=-M  J  1+m 


pm)n(ti)  =  N(2fcrr  f  Aj 

J  V  ^ 

id 

'[  iftsEiT  f ,Ai  ]  -  E{  [  srahy  j .A j  ]' ’}- E’[  sshr  " 

j=l 


First  consider  the  term 


,r  i  ;  n  i  n  eEaj 

!L  SfaCT)  AJ  J  -  KT5STT  £  eCaj]  -  S-fr 

-.r  1  l  t  i 

L  N(2M+1)  Jb1  Aj  J  3  l2Mtl")2  • , 


Next  consider 


J{  [  N(  2Mt  1  )  ^  AJ  ]  }  *  E{  [  N(2Mfl)  j  Aj  ][  N(  2M+ 1 )  *  Aj 

v'  J 

r  ,  -,2r  N  N  N 

a  L  NTsirry  j  [  ^  F(Aj) +  ^  £  AjAk 


-  c  1  vr  e^aj;  n(n-i  )  _2, . .  i 

-  I  2V5TT  /  L  ~N  +  ™V“*  E  (Aj  )  J 

N  2 

![  N(2Mfl)  ^  Aj  ]  “  (  aKT  )  (  i  ^  [  E(Aj )  ■  £2(Aj  ) 


Now  consider 


,  r  M  M  M 

E(A  )  =  B-i  £  TUt  )  =  £  E[T1  (Tj  )]  =  £  p( T  1 

J  1  J  i+m  J  - -  J  i+<n;  i+m 1 


375 


However,  lor  p(t)  continuous  and  monotonic, 


S  p ( T i . m )  =  (2M+1)P(t#) 

m=-M 


where  S  Tw  S  and  Ea(Aj)  =  (2M+1)V(t*). 

The  term  E(Aa)  is  evaluated  as  follows 
*  J 

■^-■ULyv.)  Il 

r  M  M  M 

»E  )+  £  T)fT  )  E  11  (t  ) 

1.  m=-M  j  i+m  j  i+m;  n=_M  j  i+n; 


M  MM 

-  1 E „p(T-)*  ■‘(T*>  ' „„V(Ti«>> 


since  EtTlj(Ti+m)J  =  p(Ti+m)  and  ^  1  =  2M+1.  Write 

“  m=-M 

MM  M 

E  E  P(T4ln)  =  2M  S  p(T  )  =  2M(2M+l)p(T'th)} 
n.-M  n=-M  1+n  m=-M  i+m 


E(Aa)  a  (2Mtl)p(r#)  +  2M(2M+l)pa (t4*') 

J 

E(Aa)  -  Ea  (A  )  =  ( 2M+ 1 )  [p ( T4*1 )  -  p®(T^)j 
J  J 

“’[  ’»,»< V  ]  *  “'[  i^5ET7  j,  A)  J 

-  (  S5T  )  (  S  ) 
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Finally,  then,  for  any  fixed  value  of  M,  0  £  M  <  °®, 


N-“  a*[  N(2Mhl )  ^  Aj  ]  =  °* 


and,  similarly,  for  any  fixed  value  of  N,  1  ^  N  < 

N 


:-*■  a8[  n[2i4i)  1^1  Aj  ]  "  °* 


Lim 
M' 


These  last  two  st£ foments  show  that  for  M=G, 

N 


p(Ti)  ”  £  vv 


is  a  consistent  estimator  of  p^),  Rnd  that  for  N=1  and  for  a 
value  of  T,  0  <  T  < 

VV  -  Wa  “  „ 

jn=-M 

is  a  consistent  estimator  of  PT(Ti)» 
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DESIGNS  OF  EXPERIMENTS  AS  TELESCOPING 
SEQUENCES  OF  BLOCKS 


Arthur  G.  Holms 

> 

National  Aeronautics  and  Space  Administration 
Lewis  Research  Center 
Cleveland,  Ohio 

ABSTRACT.  Sequencies  of  orthogonally  blocked  statistical  designs 
of  experiments  are  presented  for  optimum  seeking.  The  sequences  are 
such  that  observations  from  the  first  block  can  be  used  to  estimate  the  co¬ 
efficients  of  a  simple  model  and  then  be  retained  and  combined  with  obser¬ 
vations  from  new  blocks  so  that  ail  acquired  observations  are  used  cumu¬ 
latively  to  estimate  models  of  successively  greater  generality.  Such 
blocks  are  said  to  form  a  "telescoping"  sequence.  Specific  choices  were 
motivated  by  the  problem  of  optimum  seeking  experiments  in  alloy  devel¬ 
opment. 

The  designs  consist  of  full  and  fractionally  replicated  two-level  fac¬ 
torial  experiments  with  four  to  eight  factors.  The  sizes  of  the  experiments 
include  8,  16,  32,  and  64  treatments. 

INTRODUCTION.  Optimum  seeking  experiments  have  been  conducted 
by  NASA  in  developing  'mproved  engine  materials  for  the  supersonic  trans¬ 
port.  The  use  of  the  designs  presented  herewith  for  optimum  seeking  has 
been  discussed  in  reference  1.  In  addition  to  optimum  seeking,  the  designs 
could  be  used  in  many  situations  where  the  experimenting  begins  without 
prior  knowledge  of  the  complexity  needed  for  the  model. 

The  designs  consist  of  two  level  fractional  factorial  experiments 
performed  as  sequences  of  blocks.  The  designs  are  to  be  such  that  the 
first  block  will  be  a  small  fraction  of  the  full  factorial,  but  large  enough 
for  estimating  the  parameters  of  a  first  degree  model.  Successive  blocks 
are  to  be  such  that  all  acquired  data  can  be  used  cumulatively  to  estimate 
models  of  successively  greater  generality,  with  block  effects  being  un¬ 
correlated  with  the  parameter  estimates,  The  sequences  terminate  in 
designs  that  give  estimates  of  first  degree  and  two  factor  interaction  co¬ 
efficients  and  the  estimates  are  free  of  aliases  with  other  second  degree 
or  lower  order  coefficients.  Without  considering  blocking,  Steve  Webb 
in  reference  2  applied  the  terms  expansible  and  contractible  to  related 
sequences  of  designs. 

Sequences  of  regular  fractions  were  discussed  in  reference  3  by 
Cuthbert  Daniel.  Sequences  of  irregular  fractions  were  discussed  by 
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Peter  John  in  reference  4.  The  general  subject  was  explored  further 
by  Sidney  Adelman  in  reference  5. 

Box  and  Hunter  in  reference  6  recommended  the  use  of  sequences 
of  rotatable  orthogonally  blocked  designs  for  optimum  seeking.  These 
properties  require  that  the  fractions  be  regular  fractions,  that  is,  the 
number  of  treatments  is  1/2^  times  the  number  of  treatments  in  a  full 
factorial  experiment,  where  h  is  an  integer.  The  designs  to  be  pre¬ 
sented  are  all  regular  fractions. 

SYMBOLS. 

b  number  of  blocks 

E(  )  value  of  ( )  if  averaged  over  infinite  number  of  observations 

g  number  of  independent  variables  (factors) 

h  fractional  replicate  contains  1/2**  times  number  of  treatments 

performed  in  full  two-level  factorial  experiment 

i  index  number  for  trials 

j,k  index  number  for  independent  variables 

i  g  -  h 

R  resolution  level 

Xj  vector  giving  levels  of  xjj,  i  =  1 ,  .  .  n 

x^j  standardized  level  of  £  j 

y  response  (observed  variate) 

§  unknown  population  parameter 

e  error 

£j  independent  variable,  j  =  1,  .  .  .,  g 

0^  variance  of  e 
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SIZES  OF  EXPERIMENTS. 

Degrees  of  Freedom  for  Lack  of  Fit.  Consider  the  fitting  of  a 

model  equation  to  a  2^  full  factorial  experiment.  The  appropriate  equa¬ 
tion  is  as  follows: 

E(Y)  =  f30  +  p1x1  +  P2x2  +  ^3X3  +  Pl2xlx2  +  Pl3xlx3 

+  P23X2X3 +  Pl23xlx2x3  f1) 

The  equation  illustrates  the  notation.  Main  effects  are  designated  by 
symbols  such  as  Pj  and  p2.  Two  factor  interactions  are  represented 
by  symbols  such  as  Pi^-  The  independent  variates  are  represented  by 
lower  case  symbols  such  as  and  x2. 

The  number  of  treatments  minus  the  number  of  parameters  esti¬ 
mated  is  the  degrees  of  freedom  for  lack  of  fit.  The  2?  experiment  con¬ 
tains  8  treatments,  but  the  optimum  seeking  begins  with  a  first  degree 
equation  containing  only  four  parameters,  leaving  four  degrees  of  free¬ 
dom  for  lack  of  fit.  The  final  stage  of  optimum  seeking  includes  the  two 
factor  interactions  so  that  only  one  degree  of  freedom  would  remain  for 
lack  of  fit  (eq.  (1)). 

Some  information  on  the  lack  of  fit  is  always  desirable.  The  de¬ 
grees  of  freedom  for  lack  of  fit  of  the  designs  to  be  presented  vary  from 
0  to  35,  and  designs  are  provided  for  numbers  of  factors  varying  from  4 
to  8.  With  9  factors  the  use  of  a  regular  fraction  requires  128  treatments 
of  which  66  represent  degrees  of  freedom  for  lack  of  fit.  In  other  words, 
an  insistence  on  the  use  of  regular  fractions  does  not  seem  to  be  unduly 
extravagant  unless  there  are  9  or  more  factors.  The  use  of  irregular 
fractions  seems  to  be  appropriate  in  situations  involving  9  or  more  fac¬ 
tors  or  for  lesser  numbers  of  factors,  where  the  experimenting  is  very 
expensive,  and  where  the  relative  error  is  known  to  be  small. 

Resolution  Levels.  The  factorial  experiment  with  conditions  fixed 
at  just  two  levels  of  g  independent  variables  (factors)  permits  the  esti¬ 
mation  of  parameters  representing  the  grand  mean  over  the  experiment, 
the  first-order  effects  of  the  factors,  and  the  results  of  factors  interact¬ 
ing  two  at  a  time,  three  at  a  time,  and  in  all  combinations  up  to  g  at  a 
time.  If  a  fraction  1/2*1  of  this  experiment  is  performed,  not  all  these 
parameters  can  be  estimated.  True  response  functions  in  physical  in¬ 
vestigations  are  typically  smooth  enough  that  the  higher  order  coefficients 
of  an  approximating  polynomial  may  be  assumed  to  be  negligible  over  a 
small  enough  range  of  the  experimentation.  Accordingly,  only  the  lower 
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order  coefficients  need  be  estimated;  however,  they  are  allowed  to  be 
biased  by  {aliased  with)  coefficients  of  higher  order  interactions  because 
such  coefficients  are  assumed  to  be  negligible. 

Let  the  number  of  factors  in  the  highest  order  interaction  requiring 
estimation  be  e,  and  let  the  number  of  factors  in  the  lowest  order  inter¬ 
action  with  which  it  is  allowed  to  be  aliased  be  c;  then  the  required  reso¬ 
lution  R  of  the  design  is  defined  (ref.  7)  to  be 

R  =  e  +  c 

As  a  minimum  requirement  on  the  first-order  experiments,  the 
coefficients  will  be  allowed  to  be  aliased  with  only  the  coefficients  of  two- 
factor  or  higher  order  interactions.  This  requires  that  R  =  e+c  =  1+2  =  3. 
A  somewhat  improved  design  occurs  if  the  first-order  coefficients  are 
estimated  clear  of  two-factor  interactions.  ThiB  requires  that 
R  =  e  +  c-l  +  3  =  4. 

For  the  Interaction  experiments,  the  estimates  of  two  factor  inter¬ 
action  coefficients  should  be  allowed  to  be  aliased  only  with  higher  order 
interaction  coefficients.  This  requires  that  R  =  e+  c  =  2  +  3=5. 

The  design  of  the  interaction  experiment  (of  resolution  5)  is  now 
specified  to  be  blocked  into  b  blocks  such  that  any  one  block  will  provide 
a  design  of  resolution  3  for  the  first-degree  model.  As  a  consequence  of 
this  requirement,  the  experimenter  may  switch  at  any  time  from  the 
method  of  steepest  ascents  to  the  method  of  local  exploration  by  complet¬ 
ing  the  b  -  1  uncompleted  blocks  of  the  resolution  5  experiment. 

Occasions  could  arise  in  which  the  experimenter  would  not  wish  to 
proceed  Immediately  from  a.  minimum-size  first*degree  design  to  the  de¬ 
sign  for  estimating  all  Interaction  coefficients.  For  example,  a  design 
of  only  eight  treatments  hardly  provides  enough  information  to  test  the 
validity  of  the  first-degree  model.  The  performance  of  a  second  block  of 
eight  treatments  could  lead  to  a  much  better  decision.  Also,  the  experi¬ 
menter  may  have  prior  knowledge  that  certain  interactions  are  negligible 
so  that  he  can  stop  short  of  the  experiment  that  estimates  all  two-factor 
interactions.  For  these  reasons,  the  designs  and  parameter  estimates 
are  given  for  such  intermediate  size  experiments. 

Numbers  of  Factors  and  Block  Sizes.  The  assumption  was  made 
that  a  sequence  of  blocks  should  not  terminate  in  a  total  experiment  that 
contained  less  than  16  treatments,  that  is,  the  assumption  was  made  that 
a  completed  experiment  containing  less  than  16  experimental  units  is  too 
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small-for  any  statistical  assessment  of  validity.  With  16  treatments,  the 
smallest  number  of  factors  in  the  (efficient)  unreplicated  experiment  is 
four,  and  therefore  no  designs  were  investigated  having  less  than  four 
independent  variables. 

As  was  shown  in  reference  3,  the  degrees  of  freedom  efficiency  of 
regular  fractions  of  two  level  factorial  experiments  of  resolution  5  be¬ 
comes  and  remains  poor,  and  the  experiment  sizes  become  enormous,  if 
the  number  of  factors  exceeds  8.  The  investigation  was  therefore  limited 
to  4,  5,  6,  7,  and  8  factors. 

The  regular  fractional  factorial  first  degree  experiment  on  four 
factors  requires  a  minimum  of  8  treatments,  whereas  the  regular  frac¬ 
tional  factorial  first  degree  experiment  with  eight  factors  requires  a 
minimum  of  16  treatments.  Correspondingly,  the  sizes  of  the  blocks  are 
limited  to  8  and  16  treatments. 

So  that  the  experimenter  will  always  get  results  on  his  "standard 
conditions"  first,  the  principal  block  will  always  be  given  as  the  first 
block. 


CONSTRUCTION  OF  DESIGNS  AND  ESTIMATES  OF  PARAMETERS. 


Defining  Contrasts.  The  mixed  usage  of  Yaterf  notation  for  treat¬ 
ments  and  the  special  notation  of  the  present  work  is  illustrated  by 
table  1,  The  treatments  are  listed  in  the  familiar  Yates'  notation  and 
Yates'  order  in  the  first  column.  The  resulting  dependent  variates  are 
listed  in  the  corresponding  order  in  the  second  column.  Lower  case 
symbols  like  xj,  had  been  used  for  the  independent  variates.  The  full 
set  of  levels  of  such  a  variate  is  a  column  vector  of  plus  and  minus  ones 
and  is  represented  by  the  corresponding  upper  case  symbol  as  shown  by 
the  column  headings .  A  column  heading  showing  a  product  means  that 
elements  from  identical  rows  have  been  multiplied  to  produce  a  new 
column  with  the  same  number  of  rows. 

This  rule  of  multiplication  leads  to  such  relations  as 

(x1x2)(x2x3x4)  =  x1x0x3x4  =  XiX3X4 

These  operations  are  similar  to  the  more  popular  terminology  in  which: 

(AB)(BCD)  =  AICD  =  ACD 

The  present  usage  of  symbols  such  as  (3Q,  (3^'  Xq,  XjX2  avoids 

such  ambiguities  as  I  standing  for  both  the  grand  mean  and  the  identity 
vector,  and  AB  standing  for  both  the  interaction  parameter  (3^  and 
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the  contras*  vector  XiX£. 

The  general  rules  for  sequences  of  blocked  designs  were  given  in 
reference  3.  Given  now  are  rules  that  are  much  more  narrowly  stated. 

The  purpose  of  the  narrow  statement  is  to  quickly  and  easily  arrive  at 
a  list  of  treatments  and  aliased  parameters  that  will  be  in  Yates'  order 
Thus,  if  the  responses  are  listed  in  Yates'  order  then  Yates'  computa¬ 
tional  procedure  will  give  estimates  that  will  be  in  the  order  of  easily 
identified  sets  of  aliased  parameters.  Actually,  this  narrowly  stated 
procedure  results  in  no  loss  of  generality,  because  the  experimenter  is 
free  to  assign  the  symbols  x^,  X2>  •  •  -  to  his  physical  variables  in  any 
order  he  chooses. 

Although  designs  are  given  for  numbers  of  factors  from  4  to  8  and 
block  sizes  of  8  and  16,  their  construction  will  be  illustrated  by  only  an 
example  with  6  factors  anci  a  block  size  cf  8.  For  this  block  size  the 
first  8  rows  of  table  1  give  treatment  levels  that  can  be  useu  for  the  fac¬ 
tors  xp  x.j>*  and  Xg.  The  design  must  be  completed  with  orthogonal 
levels  of  x4,  Xg,  and  xg,.  For  orthogonality  the  levels  can  only  be 
levels  that  already  occur  for  columns  from  Xj  to  the  product  X1X2X3. 
Then  multiplying  the  elements  of  a  new  column  by  the  elements  from  its 
equal  among  the  old  columns  will  result  in  a  column  of  plus  ones,  namely, 
the  Xo  column. 

The  first  block  is  to  be  a  1/23  replicate  of  the  2^  design.  The 
fractional  replication  is  characterized  by  2^  defining  contrasts  of  which 
3  are  independent,  and  the  telescoping  requires  that  some  constraints  be 
placed  on  the  3  independent  defining  contrasts.  From  among  the  columns 
from  Xj  to  the  product  XjX^Xg  select  3  (as  yet  unspecified)  columns 
and  call  them  U,  V,  and  W.  Then 

x4  =  U  x5  »  V  X6  =  W 

UX4  *  xj  =  XQ;  VX5  n:  =  XQ;  WX6  =  xf  =  XQ 

The  underlined  items  are  the  defining  contrasts.  Because  they  each  con¬ 
tain  a  column  not  contained  in  the  others,  they  are  independent,  and  because 
there  are  three  of  them,  they  are  all  of  the  h  =  3  independent  defining  con¬ 
trasts.  The  group  of  defining  contrasts  is  found  by  forming  the  products 
of  the  independent  contrasts  in  all  possible  combinations: 
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ux4 

vx5 

wx6 

uvx4x5 

uwx4x6 

vwx5x6 

uvwx4x5x6 

ux4ux4 -  x0 

The  fact  that  a  sequence  of  telescoping  designs  is  desired  will  im¬ 
pose  some  constraints  on  the  choice  of  U,  V,  and  W  in  terms  of  Xj, 

X2,  and  X3. 

Defining  contrasts  are  now  to  be  considered  for  the  two  blocks  that 
will  constitute  a  2/8  replicate.  The  16  treatment  levels  for  Xj,  x^,  x$, 
and  x4  are  given  in  Yatesf  order  by  table  1.  The  columns  of  levels  of 
x§  and  X(j  need  to  be  identical  with  two  of  the  columns  from  X}  to 
X[X2X3X4  of  table  1.  Let  these  columns  (as  yet  unspecified)  be  called 
Y  and  7,  that  is,  X5  =  Y,  Xg,  -  Z  so  that  the  independent  defining  con¬ 
trasts  for  the  2/8  replicate  are  YXg  and  ZX5.  The  complete  group  of 
defining  contrasts  is: 

yx5 

ZX6 

yzx5x6 

In  the  case  of  the  4/8  replicate,  X^  is  set  equal  to  one  of  the 
product  columns  of  a  Z1  experiment.  The  defining  contrast  is  symbolized 
by  TXfe. 

In  summary,  the  groups  of  as  yet,  inaompletely  specified  defining 
contrasts  are: 
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1/8  replicate 

*o 

ux4 

vx5 

wx6 

uvx4x5 

UWX4X6 

VWX5X6 

uvwx4x5x6 


2/8  replicate 

V 

yx5 

zx6 

yzx5x6 


4/8  replicate 

*0 

tx6 


Some  of  the  constraints  of  the  design  problem  are  that  one  of  the 
blocks  of  the  2/8  replicate  must  be  identical  to  the  1/8  replicate,  and 
two  of  the  blocks  of  the  4/8  replicate  must  be  identical,  to  those  of  the 
2/8  replicate.  Thus,  for  example,  the  treatment  levels  of  X},  Xg* 
and  X3,  associated  with  Xg  of  the  2/8  replicate  must  have  8  points 
of  identity  with  the  treatment  levels  of  X},  Xj,  and  X3  associated  . 
with  Xg  in  the  1/8  replicate. 

These  identities  '  are  achieved  by  setting 

Y  »  V 


or 


Y  =  UVX4 

apd  also 


z  =  w 


or 


Z  a  UWX4 

For  the  4/8  replicate,  a  necessary  condition  is  that 
T  a  Z 


T  =  YZXg 


a 86 


or  that 


Among  the  preceding  constraints,  desirable  choices  would  result 
in  TX<.  having  at  least  5  symbols  so  that  the  4/8  replicate  would  be  of 
resolution  5.  Also,  because  each  stage  must  be  of  resolution  3,  all  de¬ 
fining  contrasts  must  contain  at  least  3  symbols.  The  choices  of  U,  V, 
W,  Y,  and  Z  should  be  consistent  with  these  objectives. 

So  that  the  first  block  will  be  a  principle  block  (so  that  it  will  con¬ 
tain  a  treatment  with  all  factors  at  their  "low"  levels)  the  defining  con¬ 
trasts  must  be  negative  if  they  contain  an  odd  number  of  symbols,  and 
positive  if  they  contain  an  even  number  of  symbols- 

Suppose  that  U  -  -X^Xg,  V  =  -X2X3  and  W  -  XjX2X3.  Multi¬ 
plying  the  resulting  defining  contrasts  together  in  all  combinations  gives 
the  group  tor  the  1/8  replicate  as  listed  in  table  8,  The  contrasts  with 
the  larger  numbers  of  symbols  are  desirable  for  the  2/8  replicate.  They 
are  attained  by  selecting  Y  =  UVX4,  and  Z  W,  and  the  defining  con¬ 
trasts  for  the  2/8  replicate  are: 


YX5  =  UVX4X5  =  X1X3X4X5 


zx6  =  wx6  = 

yzx5x6  =  uvwx4x5x6  =  x2x4x5x6 

and  these  contrasts  are  listed  as  the  2/8  replicate  in  table  8.  For  the 
4/8  replicate  the  choice  was  T  =  Z  so  that 

tx6  =  ZXfe  =  wx6  =  x1x2x3x6 

and  the  4/8  replicate  fails  to  be  of  resolution  5.  The  question  arises 
as  to  whether  a  better  choice  could  nave  been  made  for  the  defining  con¬ 
trasts  of  the  1/8  replicate. 

Achievement  of  the  highest  possible  resolution  number  at  each 
stage  of  a  sequence  of  telescoping  designs  would  be  helped  if  the  total 
number  of  symbols  in  the  group  of  defining  contrasts  were  as  large  as 
possible.  For  a  1/2*1  fraction  with  g  factors  the  maximum  number  of 
symbols  was  given  in  reference  5  as 

A  =  g2h" 1 


For  the  example  of  six  factors  with  blocks  of  size  8,  this  number  is: 
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Kepncate  i/6  2/5  4/5 

A  24  12  6 

If  a  resolution  5  design  is  to  be  achieved  at  the  4/8  replicate,  then 
TX&  must  contain  at  least  5  symbols.  From  the  preceding  table,  the 
number  cannot  exceed  6.  The  maximum  total  number  of  symbols  for  the 
2/8  replicate  ia,12  so  that  the  numbers  of  symbols  might  be  distributed 
among  the  contrasts  as  follows; 

YX5,  ZX6,  yzx5x6 


3  3  6 

To  have  a  resolution  3  design  for  the  1/8  replicate ,  all  7  defining 
contrasts  must  contain  at  least  3  symbols,  but  the  total  number  cannot 
exceed  24.  For  the  telescoping,  three  of  the  7  defining  contrasts  must 
be  distributed  according  to  one  of  the  three  preceding  distributions  of 
symbols.  Considering  only  the  upper  limit  of  24,  the  possibilities  are: 


(3,  3,  3,  3,  3,  4,  5) 


or 


(3,  3',  3,  3,  3,  3,  6) 


The  multiplication  of  two  defining  contrasts  each  containing  3 
symbols  could  result  in  defining  contrasts  of  length  2,  4,  or  6.  Con¬ 
trasts  of  length  2  would  violate  the  condition  that  the  design  must  be  of 
resolution  3.  If  3  contrasts  are  of  length  three,  the  multiplication  of 
all  pairwise  combinations  results  in  3  contrasts  at  least  of  length  4. 
Therefore  the  preceding  combinations  are  not  attainable,  that  is  a  tele¬ 
scoping  sequence  cannot  lead  from  a  1/8  replicate  of  resolution  3  to  a 
4/8  replicate  of  resolution  5.  The  sequence  must  be  continued  to  the 
full  replicate. 

Identification  of  Parameters  Estimated  by  Yates1  Contrasts.  The 
manner  in  which  defining  contrasts  can  be  obtained  for  telescoping  se¬ 
quences  of  orthogonal  blocks  has  been  illustrated.  Reference  1  shows 
how  the  defining  contrasts  were  used  to  determine  the  detailed  treat¬ 
ments  in  Yates'  order.  Reference  1  also  shows  how  the  results  of  the 
Yates'  computation  are  identified  with  the  appropriate  sets  of  aliased 
parameters. 
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In  the  case  nf  the  first-dc gree  experiments,  if  a  two-factor  inter- 
ar»i<-»n  coefficient  is  alinsed  with  a  single -factor  coefficient  (If  tlic  sum  of 
a  two-factor  coefficient  and  a  single-factor  coefficient  is  estimated  by  a 
single  contrast),  then  the  two-factor  coefficient  is  assumed  to  be  zero.  If 
a  contrast  does  not  estimate  any  combination  of  two-factor  or  lower  order 
coefficients,  the  contrast  will  be  given  a  name  by  listing  the  lowest  order 
set  of  interaction  coefficients  that  it  does  estimate.  For  example,  table  17 
lists  a  treatment  bede,  and  the  Yates'  computation  would  give  an  estimator 
of  P234  *n  same  row.  From  table  15  the  full  set  of  aliased  parameters 
can  be  shown  to  be  ^234*  "^1245’  ^147’  P 1 26*  "^3457*  ’^2356’  ^367*  an<* 
“Pi 567  °*  which  the  lowest  order  set  is  P234*  +Pl47*  +  Pi26*  +P367‘  Those 
parameters,  the  estimates  of  which  are  confounded  with  block  effects,  will 
be  identified  by  attaching  an  asterisk  to  the  parameters. 

PROPERTIES  OF  RECOMMENDED  DESIGNS.  The  designs  are  identi¬ 


fied  by  code  numbers.  For  example,  Plan  1/8;  7f,  8t/b;  2b  means  that  the 
design  is  a  1/8  replicate  of  a  full  factorial  experiment  with  7  factors,  em¬ 
ploying  8  treatments  per  block,  and  using  2  blocks.  The  order  of  presenta¬ 
tion  of  the  designs  (tables  2  to  29)  is  the  order  of  increasing  numbers  of 
factors.  For  a  given  number  of  factors,  a  sequence  of  designs  with  blocks 
of  8  treatments  is  presented  first,  followed  by  a  sequence  of  designs  with 
blocks  of  16  treatments.  Within  any  sequence,  the  order  is  the  order  of  in¬ 
creasing  numbers  of  blocks.  The  properties  of  the  designs  are  summarized 
in  table  30  and  therefore  table  30  serves  as  a  "Table  of  Contents"  for  the 
designs. 

Use  of  Resolution  4  Designs  in  Fitting  First-Order  Model.  In  gen¬ 
eral,  the  use  of  the  first-order  model  as  a  prediction  equation,  with  coef¬ 
ficients  estimated  from  an  experiment,  requires  the  assumption  that  all 
second-order  parameters  are  zero.  However,  circumstances  might  arise 
where  the  experimenter  desired  an  approximate  first-order  predicting 
equation  and  ignored  the  existence  of  possible  nonzero  two-factor  inter¬ 
actions.  He  might  then  prefer  a  resolution  4  design  to  a  resolution  3  de¬ 
sign  because  the  estimates  of  the  first-order  coefficients  would  not  be 
aliased  with  (biased  by)  two-factor  interactions. 

Minimum-size  designs  of  resolution  4  are  shown  for  4  factors  by 
table  2,  for  5  factors  by  table  5,  and  for  6  factors  by  table  10.  Minimum- 
size  designs  of  resolution  4  for  7  and  8  factors  were  given  by  Natrella 
(ref.  8,  p.  12-18),  and  these  designs  are  also  given  in  tables  28  and  29. 
Unfortunately,  no  success  was  achieved  in  trying  to  include  the  designs 
of  tables  28  and-  29  in  the  telescoping  sequences  of  7-  and  8-factor  blocked 
designs,  that  is,  tables  21  to  27.  However,  the  designs  of  tables  28  and 
29  might  be  used  for  the  very  first  trial  of  a  Box- Wilson  procedure,  when 
the  experimenter  believed  that  he  would  be  so  far  from  an  optimum  condi¬ 
tion  that  a  first-order  model  would  be  a  good  enough  approximation. 


389 


Aftsr  “rich  »  trial  h*  could  move  to  a  new  design  center  and  then  elect 
a  design  capable  of  being  sequentially  expanded  by  blocks  into  designs 
of  higher  order,  that  is,  the  designs  of  tables  21  or  25. 

Conditions  for  Using  Resolution  3  and  Resolution  4  Designs  in 
Estimating  the  Second-Order  Model.  If  the  experimenter  has  prior 
knowledge  th  t  some  of  the  two-factor  interactions  are  zero,  he  may 
be  able  to  choose  the  labels  for  his  factors  so  that  the  nonzero  inter¬ 
action  parameters  can  be  estimated  from  designs  of  less  than  resolu¬ 
tion  5.  The  specific  cases  are  listed: 

Table  2.  -  Plan  1/2;  4f;  8t/b;  lb.  -  If  one  of  the  factors  (for  ex¬ 
ample  x,)  does  not  interact  with  the  other  factors,  then  all  the  remain¬ 
ing  interactions  are  estimable  (table  2).  If  Xj  is  noninteracting,  the 
estimated  parameters  are  Pq,  (3^,  pj*  034*  03>  024*  023'  and  04* 

Table  5.  -  Plan  1/2;  5f;  8t/b;  2b.  -  The  factor  believed  most 
likely  to  interact  with  other  factors  should  be  labeled  X4  because  the 
plan  (table  5)  gives  unconfounded  estimates  of  P14,  P24,  P34,  and 
P45.  If  any  one  of  xp  xg,  xj,  or  X5  does  not  interact  with  the  others 
(for  example,  x^)  then  all  the  remaining  two-factor  interactions  are 
estimable  and  the  estimated  parameters  are  Pq,  Pj,  p^,  P35,  P3,  P25, 

023*  05'  04»  014*  024'  0345*  P34'  0245*^234  +  0145^’  and  045‘  Under 
previously  stated  assumptions,  the  estimates  of  p^,  (3^45 •  and  0245 
are  assumed  to  be  nothing  more  than  random  error. 

Table  10.  -  Plan  1/4}  6f;  8t/b;  2b.  -  If  Xj  does  not  interact  with 
any  other  factor,  and  if  X2  does  not  interact  with  x4,  X5,  and  xf,,  then 
the  parameters  estimated  are  as  follows:  Pq,  Pi>  Pg»  P 3 6»  03>  045>  023> 

06*  04*  035»  056'  (0*24  +  0*56  +  0235  +  0*346)'  034*  05’  046*  and 

estimate  of  (P125  +  0146  +  0234  +  0356)  *a  assumed  to  be  random  error 
(table  10) .  " 

Table  11.  -  Plan  1/2;  6f;  Bt/b;  4b.  -  If  the  label  Xj  had  been  given 
to  the  most  likely  noninteracting  factor  in  the  design  of  table  10,  the  per¬ 
formance  of  the  two  augmenting  blocks  of  table  11  would  result  in  a  design 
with  all  interactions  estimable  under  the  minimal  assumptions  that  p^» 
p 2 3,  and  p^g  are  zero. 

Table  13.  -  Plan  1/4;  6f ;  I6t/b;  lb.  -  Assume  that  there  are  two 
groups  of  three  factors  and  that  each  factor  does  not  interact  within  its 
group.  Give  the  factors  within  one  group  the  labels  Xj,  x2,  and  Xg  and 
label  the  factors  of  the  other  group  X3,  x4,  and  xj.  Then  all  the  non- 
.  aero  two-factor  interaction  coefficients  (one  factor  from  each  group)  are 
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estimable  arid  are  (3]3>  p14.  Pl5,  P23>  024'  f^S'  ^36-  046-  and  056 
(table  13). 

Table  IS.  -  Plan  1/4;  7f;  St/b ;  4b.  -  This  plan  (tabic  18)  becomes 
a  suitable  second-order  design  under  the  assumptions  that  xj  does  not 
interact  with  x3,  X4  or  xg,  and  that  x2,  X5,  and  X7  do  not  interact 
with  each  other . 

Table  21.  -  Plan  1  / 8 ;  7f;  I6t/b;  lb.  -  This  plan  (table  21)  esti¬ 
mates  two-factor  interactions  if  x^  is  noninteracting,  if  x2  is  noninter¬ 
acting  with  HHyr  M41  Mgr-x»d— M7 ,  and  if  X5  is  noninteracting  with  x4 
and  xg. 


Table  22.  -  Plan  1/4;  7f;  lfet/b;  2b.  -  This  plan  (table  22)  esti¬ 
mates  all  two-factor  interactions  if  any  one  of  X},  x2,  x4,  or  xg  does 
not  interact  with  the  other  factors  of  this  group. 

Table  26.  -  Plan  1/8;  8f;  lfet/b;  2b.  -  This  plan  (table  26)  esti¬ 
mates  all  interactions  if  xg  is  noninteracting  with  X),  x2,  X3,  xg,  and 
x7,  and  if  x3  is  noninteracting  with  Xj,  x2,  x4,  and  xg.  Thus  the  label 
xg  should  be  given  to  the  least  interacting  variable,  the  label  x3  should 
be  given  to  the  next  least  interacting  variable,  the  labels  x3,  x  5,  and  xj 
should  be  given  to  the  variables  least  likely  to  interact  with  xg,  and  the 
labels  x4  and  xg  should  be  given  to  the  variables  least  likely  to  inter¬ 
act  with  X3. 

CHOICE  OF  BLOCK  SIZE.  The  present  investigation  assumes  that  the 
experimenter  will  wish  to  perform  a  block  of  treatments,  analyze  the 
data,  and  then  perform  another  block  of  treatments,  and  that  the  block 
effects  arise  during  the  interruption  of  the  experimenting  for  analyzing 
data  (furnaces  are  overhauled,  instruments  are  newly  calibrated,  etc.). 
Under  these  assumptions,  block  sizes  8  and  16  are  particularly  appro¬ 
priate  for  experiments  on  4  to  8  factors.  On  the  other  hand,  the  physical 
situation  could  limit  the  experimenter  to  smaller  block  sizes.  Under 
such  limitations,  other  designs  would  have  to  be  synthesized,  and  the 
synthesis  could  be  done  according  to  rules  already  presented. 

Another  reason  for  using  small  block  sizes  is  to  protect  against 
the  hazard  of  missing  values.  If  through  accident,  the  observations  from 
one  or  more  treatments  are  missing  from  a  block,  the  whole  block  could 
be  rerun,  especially  if  it  is  small.  On  the  other  hand,  only  the  missing 
treatments  need  be  run,  if  the  experimenter  can  say  that  no  block  effect 
will  arise  between  the  new  runs  and  the  block  from  which  observations 
are  missing.  If  the  design  is  not  severely  fractionated  (if  the  number  of 
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treatments  is  significantly  larger  than  the  number  of  parameters  esti¬ 
mated),  methods  of  estimating  for  missing  values  may  be  used  (ref.  9 
or  10). 

Some  attributes  o 4  the  proposed  designs  are  summarized  in  table  30. 
In  the  case  of  4  factors,  all  coefficients  are  estimable  from  two  blocks  of 
size  8  and  a  single  block  of  size  16  is  of  no  advantage  in  estimating  the 
parameters  of  a  second-degree  model.  In  the  case  of  7  factors,  the  attain¬ 
ment  of  a  resolution  5  design  requires  64  treatments  for  either  blocks  of 
size  8  or  size  16,  so  that  there  is  no  clear  advantage  is  using  blocks  of 
size  16.  With  8  factors,  the  minimum  first-order  design  requires  16 
treatments,  and  this  is  the  only  block  size  presented  for  the  problem  with 
8  factors.  In  the  cases  of  5  and  6  factors,  the  choice  of  a  block  size  of  8 
or  16  is  particularly  complex. 

A  comparison  of  the  number  of  experimental  unite  required  in  ex¬ 
perimenting  with  block  sizes  of  8  and  16  for  5  and  6  factors  is  given  in 
table  31.  The  column  headed  "Total  number  of  units  required"  shows  that 
for  five  factors,  the  break-even  point  for  the  two  block  sizes  occurs  at 
three  repetitions  of  the  first-order  experiments.  For  six  factors,  the 
break-even  point  occurs  for  five  repetitions  of  the  first-degree  experi¬ 
ments.  In  other  words,  if  the  experimenter  believes  that  he  will  perform 
many  cycles  of  experimenting  with  the  method  of  steepest  ascents,  he 
should  use  a  block  size  of  8  because  it  uses  a  relatively  smaller  number 
of  experimental  units.  On  the  other  hand,  the  block  of  size  16  uses  a 
relatively  smaller  number  of  experimental  units  in  the  method  of  local 
exploration.  The  block  size  of  16  should  be  used  if  the  experimenter 
believes  he  will  spend  relatively  few  cycles  of  experiments  with  the 
method  of  steepest  ascents,  less  than  three  cycles  with  5  factors  or  less 
than  five  cycles  with  6  factors. 

Maximum  economy  could  be  sought  with  a  mixed  strategy.  The  ex¬ 
perimenter  could  use  the  block  of  size  8  until  his  intuition  told  him  that 
the  first-degree  model  might  not  be  appropriate.  He  could  then  switch  to 
the  block  of  size  16.  Its  greater  number  of  degrees  of  freedom  for  "lack 
of  fit"  would  provide  better  information  about  the  validity  of  the  first- 
degree  model,  and  on  switching  to  the  method  of  local  exploration,  fewer 
experimental  units  would  be  needed  to  complete  the  interaction  model  than 
if  the  smaller  block  had  been  used.  Thus  with  five  factors,  one  or  two 
experiments  of  the  method  of  steepest  ascents  should  be  performed  with 
the  email  block  size  followed  by  a  switch  to  the  larger  block.  With  six 
factors,  the  break-even  point  is  not  reached  until  the  fifth  design  center. 
Furthermore,  two  blocks  of  size  8  (table  10)  provide  a  resolution  4  de¬ 
sign,  whereas  the  single  block  of  size  16  (table  13)  is  only  a  resolution  3 
design.  With  six  factors,  the  best  strategy  might  consist  of  using  blocks 
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design  could  be  enlarged  to  that  of  table  10.  If  no  new  design  center 
were  desired,  the  design  could  then  he  augmented  to  that  of  table  11. 

If  the  design  of  table  10  had  not  shown  significant  interactions,  experi¬ 
menting  at  a  new  design  center  could  continue  with  the  design  of  table  9* 
but  if  significant  interactions  had  been  Bhown,  the  new  experimenting 
should  begin  with  the  design  of  table  13. 

CONCLUDING  REMARKS.  Sequences  of  blocked  designs  of  ex¬ 
periments  have  been  presented  that  are  telescoping,  in  the  sense  that 
the  first  block  is  a  design  for  which  main  effects  are  measurable,  and 
that  subsequent  blocks,  as  they  are  added  to  the  design,  allow  models 
of  successively  greater  generality  to  be  fitted  to  all  acquired  observa¬ 
tions  at  each  stage.  The  sequences  terminate  in  designs  for  which  all 
two  factor  interactions  are  measurable. 
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Table  2.a  -  PLAN  1/2;  4£;  8t/b;  lb  - 


R  -  4 

[x0  =  x1x2x3x4.] 


Block 

Treatment 

Estimated  effects 

1 

(1) 

So 

1 

ad 

*1 

1 

bd 

fl2 

1 

ab 

012  +  034 

1 

cd 

03 

1 

ac 

013  +  024 

1 

be 

023  +  014 

1 

abed 

04 

aRefs.  tt-(p.  484)  and  t*  (p.  12-16). 
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TABLE  3. a  •  PLAN  1;  41;  Bt/b;  2b  - 
R  -  S 


[Block  confounding,  XjXjXjXj.] 


Block 

Treatment 

Estimated  effects 

(b) 

1 

U) 

00 

2 

a 

01 

2 

b 

02 

H 

ab 

012 

03 

n 

013 

» 

023 

0123 

04 

ad 

014 

bd 

024 

abd 

0124 

i 

cd 

034 

n 

acd 

0134 

■31 

bed 

0234 

n 

abed 

01234 

aRefs.  %  (p.  428)  and  il(p.  12-10). 
b  Asterisk  denotes  confounding  with 
blocks. 
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TABLE  4.  -  PLAN  1/4;  5f;  8t/b;  lb  - 


[*0 


-x2x3x4  .  x,x2x3x5 

*X1X4X5'] 


Block 

Treatment 

Estimated  effects 

1 

(1) 

0O 

1 

u 

01  '  04S 

1 

bde 

h  '  fl34 

1 

abd 

012  +  fl3B 

1 

cde 

03  ‘  024 

mm 

acd 

013  +  025 

■  = 

be 

■04  +  023  +  015 

. 1  i 

abce 

05  ‘  014 

TABLE  5.  -  PLAN  1/2;  6t;  8t/b;  2b  - 
R-4 


[Xq  ■  XjXjXjXji  block  confounding, 

“W4']  " 


Block 

Treatment 

Estimated  effects 

(a) 

1 

(1) 

00 

1 

ae 

0i 

2 

be 

02 

2 

ab 

012  +  035 

2 

ce 

03 

2 

ac 

013  +  025 

1 

be 

023  +  015 

1 

abce 

05 

d 

04 

ade 

014 

1 

bde 

024 

1 

abd 

0124  +  0345 

1 

cde 

034 

1 

acd 

0134  +  0245 

2 

bed 

0234  +  0145 

2 

abede 

045 

lA«terlsk  denotes  confounding  ’vith 
blocks, 


TABLE  8.  -  DEFINING  CONTRASTS,  8  FACTORS  ON 


BLOCKS  OF  8  TREATMENTS 


Source 

[ - 

Defining  contra  eta 

1/8  Replicate 

1/4  Replicate 

1/8  Replicate 

A 

-x1x2x4 

x| 

-XjXjXb 

4 

X^jjXjX, 

X^jjXjXg 

X^jX^B 

*!*i 

-x3x4x8 

mu 

-XjXgXg 

X4X5X0 

X2X4Xf,X8 

TABLE  9.  -  PLAN  1/8;  6f;  8t/b;  Ib- 
R»  3 


[X„  -  X1X2X4  -  -XjXjXb  -XjXjXjXg 

.xlx3xixi-x3xixr-x1xixe 
.x2x4x„xe,]  _ _ 


Block 

Treatment 

Eetimated  effect* 

1 

(1) 

*0 

i 

adf 

*1  -  *l4  -  *50 

1 

bdef 

*2  -  *SB  *  *14 

BJ 

abe 

■*4  ♦  Sia  «■  s,# 

B 

cef 

*3  '  *25  '  *48 

1 

aede 

*15  +  *20  +  *45 

1 

bed 

"*8  +  *25  +  *18 

1 

abef 

*8  *  *11  ”  *S4 
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I 

I 


Estimated 

effects 

(a) 

_ 1 

*  .  Cfi 

<£>«oS  to  S  S  n  ©  S  8  2  8  ¥  ¥  S 

ifi  ia  a  n  tA  <o  r>  w  w  ♦  ^  w  t  r>  «  N 

to  *  rq  to  i-t  is  «h  m***  *-•  n  *h _rq  ** 

ai  ca  o.  «.  co.Q3.caca.  oa.cQ.caai.  aa.  ca  d  n 

1 

sill  •gill  sill  S  111 

Block 

t-  w  n  co  ^  u>  <o  ^  ifi  t  «  in  *o  «  c-  n 

Estimated 

effects 

(a) 

«d  co  to  to  »  2  5  to 

(O  V  C4  tO  CO  O  N  ^  ««N 

(O  *4  CM  v"t  n  H  N*«  ^  (<4  *4  #40  H  jfl  rH 

oacjxcaca  aa.caaa.ca  <n  «a  «i  ca  ca.cacd.ca 

1 

.til  t.3  S  S  «t  S?  3  111 

1 

« 

«  n  N  f  "C  C  in  |H  IA  *  <D  N  f  00  rt 

!  j- 
11' 
w 

■nnmi 

1 

•  s  it  1  8  S  S  1  «  i  i  1  ill! 

Block 

lb 

U 

) 

-Ml  »  8  2  4  «  3  3  1  nisi 

Block 

-mot  h  n  n  »  «  oo  t-  n  to  <r  *4  ia 
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TABLE  15.  -  DEFINING  CONTRASTS  WITH  7  FACTORS  ON  BLOCKS  OF  8  TREATMENTS 


Source 

Defining  contrasts  | 

1/16  Replicate 

1/8  Replicate 

1/4  Replicate 

1/2  Replicate 

X4 

-x,x2x4 

X5 

"X1X3X5 

-XjXjXj 

v2 

x6 

-XjXjXg 

x7 

XlW7 

XjXjXjX, 

X4X5 

X2X3X4X5 

x’x* 

X1X3X4X8 

xix3x4xe 

X1X3X4X8 

X4X7 

“X3X4X7 

y2y2 

X5X6 

XIX2X5X6 

y2y2 

X5X7 

'W? 

-X2X5X7 

"X2X5X7 

X2X2 

"6"7 

-xixex7 

y2y2y2 

x4x5xe 

-x4x8xfl 

‘X4XBX8 

X4X5X7 

X1X4XBX7 

xjxgx2 

XjXgXgX, 

xax4x«x7 

*5X6*7 

x3x8xex7 

xjxfxgx2 

-x1x3xjx4x8x8x7 

-x^jXjX^jXjX, 

-X^jXjXgXjX.jX, 

-XjXjXgX.jXjXgX^ 

TABLE  18.  -  PLAN  1/18;  71;  8t/b;  lb  - 
R  ■  3 


[Defining  contract*  given  by  table  IB.] 


Block 

Treatment 

Estimated  effect! 

1 

(1) 

*0 

1 

adeg 

*1  "  *24  '  *38  ‘  *67 

1 

bdfg 

*2  ‘  *14  ‘  *36  "  *87 

1 

abet 

‘*4  +  *12  +  *37  *  *66 

1 

cefg 

*3  ‘  *16  '  *26  ‘  *47 

1 

acdf 

'*5  +  *13  +  *46  *  *27 

1 

bed* 

-*8  +  *jj  +  *n  +  *45 

1 

abeg 

*7  ’  *34  ’  *28  '  *16 
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with  blocks. 


coafooMttng  with  blocks. 


liU,WS  J0<  -  DE™W0  CONTRASTS  WITH  7  FACTORS 
ON  BLOCKS  OF  16  TREATMENTS 


Defining  contraata 


1/8  Replicate 

1/4  Replicate 

1/2  Replicate 

y2 

-W6 

*2 

xiWj 

xiWs  | 

Wi*n  | 

xhc* 

x5*e 

'We 

*2*7 

-XjXjXjXjX^ 

-XjXjXjXjXy 

*2*7 

xixaxex7 

*2*e*7 

”X1*4W7 

*X2X4X5Xex7  j 

-XjX4XjXjX7  I 

TABLE  21.  -  PLAN  1/8;  7f;  llt/b;  lb  - 
R  ■  2 


[Defining  contraeti  given  In 
(able  30.] 


Block 

Treatment 

Eetlmnted  effect* 

1 

1 

1 

1 

1 

1 

* 

1 

1 

1 

1 

1 

1 

1 

1 

1 

(1) 

aef 

w « 

■beg 

og 

aeefg 

bcf 

abce 

defg 

adg 

bde 

abdf 

cdef 
nod 
be  deg 
ebedfg  I 

* 0 

*1 ' *4R 

fa  ’  Hi 

Ha *  *4a 

*3 

*13  +  Hi 

Ha *  Hi 
'Hi 

H  ■  *18 
‘H  *  *u +  Hi 
Hi*  *ie +  *27 
*8  "  *« 

*24  f  *27 

-*38 

*7 

*2«  ♦  *17 

TABLE  22.*  •  PLAN  1/4;  7f;  ISt/b;  2b  -  R  =  4 


[Defining  contrasts  given  In  table  20;  block  confounding,  -XjX^Xj.] 


Block 

Treatment 

Estimated  effects 

Block 

Treatment 

Estimated  effects 
(b) 

1 

■■ 

0O 

B 

eg 

% 

2 

mm 

D 

aef 

015 

1 

BK 

02 

H 

bef 

025 

2 

ab 

312  +  ^46 

B 

abeg 

"037 

1 

eg 

*3 

ce 

035 

2 

acf 

013 

i 

acefg 

-027 

1 

bef 

023 

2 

bcefg 

2 

abeg 

-^87 

1 

abce 

-07 

2 

df 

0< 

1 

defg 

048 

1 

adg 

+  32fl 

2 

ada 

0148  +  0288 

2 

bdg 

024  +  H  8  ■ 

i 

bde 

0248  +  0188 

1 

abdf 

06 

2 

abdefg 

088 

2 

cdfg 

034 

1 

edef 

-087 

1 

acd 

0134  *  0238 

2 

aedeg 

"0247  ‘  0167 

2 

bed 

0234 +  313S 

1 

bedeg 

-0147  -  0g67 

1 

abedfg 

038  ■ 

2 

abedef 

“047 

> 


^Asterisk  denotes  confounding  with  blocks. 
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X3X4X5X6X7X8  X3X4X5X8X7X8  X3X4X5X8X7X8 


TABLE  26.  -  PLAN  1/8;  8f;  16t/b;  2b  -  R  »  3 


[Defining  contraets  given  in  table  24;  block  confounding,  -XjX^Xj] 


Block 

Treatment 

Estimated  effects 

Block 

Treatment 

Estimated  effect* 
(a) 

1 

(1) 

*0 

2 

eg 

*5 

2 

alg 

*1 

1 

a*f 

*15  +  *78 

1 

bgh 

*2  *  *38 

2 

beh 

*25 

2 

•bfh 

*12 

1 

abefgh 

‘*37 

1 

cfgh 

*3  ‘  *28 

2 

cefh 

*35 

2 

ach 

*13  +  *46 

i 

acegh 

’*27 

1 

bet 

'*8  +  *23 

2 

bcefg 

‘*58  '  *17 

2 

abcg 

'*57  '  *18 

1 

abce 

'*7 

2 

df 

*4 

1 

defg 

*45 

adg 

*14  +  *86 

2 

ade 

*145  +  *356  +  *478 

bdfgh 

*24 

1 

bdefh 

‘*67 

abdh 

'*88 

2 

abdegh 

"*347  '  *568  '  *167 

cdgh 

*34  +  *16 

1 

edeh 

*345  *  *156  +  *678 

1 

acdfh 

*6 

2 

acdefgh 

*56 

2 

bed 

"*48 

1 

bedeg 

'*147  “  *367  '  *458 

1 

abedfg 

*26 

2 

abedef 

'*47 

‘Asterisk  denote*  confounding  with  block*. 
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1 II) 

CO 

in 

•** 

cx 

00 

t** 

our 

00 

r- 

M 

ca. 

co 

in 

co 

Cl 

J  g  e 

tj  o 

t 

r- 

<c  to  cc 
to  ifl  u)  n 
in  •*  cm  d 

51  ca  CO.  i 

4  p* 

to  to  to  P* 
w  «s  p*  <o 

+  CO  00 

<0  in  in  ao 
in  cm  m 

t* 

CO  OO  (0 

CO  [n  f  1* 

W 

*wei  <a  d 

Cl  1  1  i 

•if  ca  ci  ca 
d  »  '  i 

ca  d  i 

Treatment 

‘  f 
f  is? 

a  f 

1  s  jj  i 

cdef 

acdefgh 

bcdefgb 

abcdef 

Block 

n  h  «  in 

N  ^  N  ♦ 

w*  co  r*  m 

^  N  ^  N 

Estimated 

effects 

W 

te  <o  $ 

00 

V 

n 

Cl 

i 

<o  <c  E 
id  r>  «S 
n  «  r4 

sQ,  CQ.  01  t 

oo  ao 

«  N  H  85 

<*'?■  ?■  7 

00 

f 

lO 

ca 

S  S  S  • 

s*aT«r«r 

ca»ii 

Treatment 

M  bfi  € 

a  s  b  « 

tilt 

■a  &  m 

*ss? 

ill! 

3 

on 

N  ^  N 

«H  «0  iX  M 

ro  «#  cm 

C*»  H  «  H 

5  « 

1  |  * 
*  S 

u  . 

in  in  Si 
nfnToTt 

SaF  JFaF 
aT?  ?  T 

| 

i 

»  •  «  J 

nT^ar? 

00 

aF 

g  J  S  5 

aT?  ?  ? 

1 

! 

1 

if  3  i  I- 

till 

list 

ml 

Block 

N  ♦  N  ^ 

«  H  «  H 

^  N 

fl  W  H  « 

h 

1 

n 

H 

« 

O  **  CC  <•« 

<a  co,  «x  pql 

1“ 

to  r>  in 

a^aTaF'^ 

«  s  *5 

aTaTaT^ 

1! 

I 

s  1 Is 

if  1 3 1 

«*?i 

f  lit 

s 

05 

h  n  h  n 

f  w  t  N 

m  h  n  h 

N  f  N  t 
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with  blocks. 


TABLE  30.  -  ATTRIBUTES  OF  RECOMMENDED  DESIGNS 


Table 

Replication 

Factora, 

K 

Treatment* 
per  block 

Number 

of 

blocka 

Reeolutlon, 

R 

Number  of 
two-factor 
Interaction!, 

s(s  -  i)/a 

Number  of 

estimable 

two-factor 

interact  Iona 
(a) 

Cl 

1/3 

4 

8 

1 

4 

0 

0 

1 

Full 

4 

8 

2 

5 

6 

6 

B 

1/4 

5 

8 

3 

10 

0 

1/2 

8 

8 

4 

10 

4 

B 

Full 

5 

8 

S 

10 

10 

1/3 

5 

18 

1 

10 

10 

■ 

1/0 

8 

1 

3 

18 

0 

10 

1/4 

8 

8 

4 

15 

9 

11 

1/3 

8 

8 

4 

0 

is 

Full 

8 

8 

8 

18 

15 

la 

1/4 

8 

18 

1 

3 

18 

0 

14 

1/2 

• 

18 

2 

5 

18 

15 

10 

1/18 

8 

1 

3 

21 

0 

17 

1/8 

8 

2 

3 

21 

0 

18 

1/4 

7 

8 

4 

21 

11 

10 

1/2 

7 

8 

8 

8 

21 

21 

31 

1/8 

7 

18 

1 

3 

21 

1 

1/4 

7 

18 

2 

4 

21 

15 

1/2 

7 

18 

4 

5 

21 

21 

B 

1/18 

8 

18 

» 

mm 

— 

o 

1/8 

8 

18 

3 

■ 

27 

1/4 

It 

18 

4 

Hflui 

mm 

□ 

1/8 

16 

1 

4 

21 

mm 

30 

1/10 

MM 

18 

1 

4 

28 

I1H1 

*Only  unconfounded  two-factor  Interaction  eetlmatora  art 


ounttd. 


413 


TABLE  31.  -  COMPARISON  OF  TOTAL  TREATMENTS  (EXPERIMENTAL  UNITS) 
REQUIRED  WHEN  FIRST  BLOCK  IS  PERFORMED  TO  ESTIMATE  FIRST -ORDER 
MODEL  AT  STATED  NUMBER  OF  DESIGN  CENTERS  AND  INTERACTION 


EXPERIMENT  IS  PERFORMED  ONLY  AT  FINAL  DESIGN  CENTER 


i 

i 

it 


ON  A  CLASS  OF  NONPARAMETRIC  TESTS  FOR  INTERACTIONS  IN  FACTORIAL  EXPERIMENTS'* 


Pranab  Kumar  Sen 

University  of  North  Carolina  at  Chapel  Hill 

1.  Summary  an  '  ^introduction ,  This  paper  deals  with  a  class  of  permutationally 
distribution-free  aligned  rank  order  tests  for  Interactions  in  factorial  experi¬ 
ments  replicated  in  complete  blocks.  The  asymptotic  power-efficiencies  of  the 
proposed  tests  with  respect  to  the  classical  analysis  of  variance  test  are  also 
studied. 

Nonparametrlc  analysis  of  variance  teats,  available  in  the  literature, 
mostly  relate  to  one  way  or  two  way  (without  interaction)  layouts.  Though  the 
approach  of  Lehmann  (1964)  (see  also  Riri  and  San  (1966))  can  be  adapted  to  con¬ 
struct  tests  for  interactions  in  factorial  experiments,  the  necessity  of  avoiding 
incompatibility  of  the  unadjusted  estimates  as  wall  as  of  estimating  some  functional 
of  the  psront  distribution  (  appearing  in  the  expression  for  the  dispersion  matrix 
of  the  estimators)  makes  such  tests  only  asymptotically  distribution-free  and 
somewhat  tedious  to  apply.  In  the  preaent  paper,  the  theory  of  aligned  rank  order 
tests  based  on  Chernoff -8avage  (1958)  type  of  statistics,  developed  in  Sen  (196® , 
is  further  extended  to  provide  suitable  teats  for  interactions  in  factorial  layouts 
with  equal  number  of  observations  per  cell.  Under  certain  permutatlonal  invariance 
arguments  the  nonparametrlc  structure  of  the  proposed  tests  is  established.  These 
tests  are  also  free  from  the  other  two  difficulties  mentioned  earlier.  Further, 
using  a  generalisation  of  Chernoff -Savage  (1958)  theorem  on  the  asymptotic  normality 
of  rank  order  statistics  to  aligned  observations,  the  asymptotic  power-efficiencies 
of  the  proposed  tests  (along  with  certain  bounds  are  studied, 

2.  PrelJagdMirjr^wtlonB.  He  shall  consider  in  detail  only  the  case  of  replicated 

two  factor  experiments  with  one  observation  per  call  and  indicate  briefly  the  theory 

*  Work  supported  by  the  National  Institute  of  Health,  Public  Health  Service,  Grant 
GM-12868. 


for  the  case  of  several  factor*  and/or  obaarvatlona  par  cell.  The  chance  variable 
aisoclated  with  the  yield  of  the  plot  placed  in  the  1th  replicate  and  receiv¬ 
ing  the  combination  of  the  jth  variety  (or  level  of  the  first  factor)  and  the  k-th 
treatment  (or  level  of  the  second  factor),  la  expressed,  In  accordance  with  the 
usual  fixed-effect*  model,  as 

(2.1)  Tijk  "  **1  +  Vj  +  +  ^Jk  +  Uljkf  J“l,...,p}  k-l,...,q, 

whar*  Uj, .  stand  for  the  replication-affects.  Vj,  ...,v  for  the  variety-effects, 

Tl* '  Tq  f°r  th"  1lii»  •  •  •  -Tipq  for  th»  variety  x  treatment  inter¬ 

actions,  and  Ujjji'a  «*•  the  residual  arror  components.  In  (2.1),  we  may  put 

p  q  q  p 

(2.2)  £  V  -0,  £  t.-O,  £  tw.-O,  J-l,...,p,  snd  £  q..-0,  k-l,...,q. 

j-1  J  k-1  *  k-1  j-l  J 

It  is  assumad  that  (U^, . . .  ,Uip(j),  1»1,  ,..,n  era  independent  and  identically  dis¬ 
tributed  stochastic  vectors  having  a  common  continuous  (joint)  cumulative  distribu¬ 
tion  function  (cdf)  F(x^, . . ., x^)  which  is  symmetric  in  its  pq  arguments;  this 
includes  the  conventional  assumption  of  independence  and  identity  of  distributions 
of  all  the  npq  error  components  as  a  particular  ease.  We  want  to  test 

«■»  v  s-<vsp“"’ 

against  the  sat  of  alternatives  that  T  la  non-null.  By  means  of  the  following 
intra-block  tranaformations,  we  eliminate  the  nuisance  parsimeters  Vj's  and  t^'s. 

Let  us  consider  tha  p  x  q  matrices 

(2.4)  Yt-  (YlJk),  Ut-  (Uijk),  Zt-  («ljk)  and  -  <*1Jk>,  i-l,...,n, 
where  we  define 
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(2.5) 


I  being  the  Identity  matrix  of  order  t  and  f  the  (row)  t-vector  having  all  the  t 
elements  equal  to  unity,  t  >  1.  Then  from  (2.1)  through  (2.6),  we  obtain  that 

(2.7)  2im2  +  2i> 

In  the  sequel,  we  shall  work  with  the  nuisance  parameter-free  model  (2.7).  Also, 
we  will  only  consider  the  case  when  p,  q  >3.  If  either  of  them  is  less  than  3, 
the  situation  simplifies  as  follows.  Suppose  q-2,  p  >  2,  then  from  (2.1)  and  (2.2) 
we  have  t)^  •  -r)J2  -  rjj  (say),  J-l,  thus  (2.3)  reduces  to  H*  :  -  0. 

Again  from  (2.5)  and  (2.6),  wa  have 

(2. S)  2iJ1  -  -*1J2  -  Ztj  (say),  e^  -  -e1J2  -  #ij  (say)  for  all  i-l,...,n. 

It  follows  from  lemma  3.1  [to  be  proved  in  section  3]  that  (e^, . . ., e^  )  are 
symmetric  dependent  or  interchangeable  random  variables  for  all  i»l,...,n.  Con¬ 
sequently,  baaed  on  the  set  of  observations  fZ^,  J*l,...,pj  i"l,...,n},  the  problem 

of  testing  H  in  (2.3)  reducaa  to  that  of  testing  the  interchangeability  of 
0 

(Z11, . .  .,zlp)  (for  all  i-1,  ...,n),  against  shift  alternatives.  As  such,  the  results 
of  Sen  (1968  )  will  directly  apply,  and  the  details  are  omitted.  If  p»q»2,  we  have 

T)n  "  "^12  "  "121  "  ^22  "  11  <My)'  *"d 

(2.9)  Z1U  -  -zll2  -  -z121  -  zL  (say),  i-l,...,n, 

and  it  is  also  easily  seen  that  the  distribution  of  Z^  is  symmetric  about,  tj.  So 

the  problem  of  testing  Hq  in  (2.3)  reduces  to  that  of  testing  the  symmetry  (about  rero) 
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of  the  distribution  of  2^;  this  is  the  well  known  one  sample  location  problem,  and 
hence,  Is  not  discussed. 


3.  Vie  basic  permutation  principle.  We  define  U,  and  E.  (i“l,  ...,n)  as  in  (2,4) 
end  (2.6),  and  let  F*(I)  be  the  joint  cdf  of  for  i«l,...,n.  Let  J  “  (Jj,...,j  ) 

be  any  permutation  of  (l,...,p)  and  J,  the  set  of  all  possible  (pi)  permutations, 
so  that  J  e  J.  Also  let 

Al  A« 


(3.1) 


where  is  the  usuel  Kronacker  delta.  Now  for  any  J  (  J,  Ip(^)  is  non-singular 
end  has  a  unique  reciprocal  (say),  which  also  belongs  to  the  set 

(IpCj)i  J).  Further,  It  Is  easily  seen  that  if  IpCJj)  and  Jfp(J2)  he  defined 
as  In  (3.1)  for  «  J,  J2  e  J,  then  3p<4l)Ip(J2>  «1»  belongs  to  (Ip(j):  j  €  J). 
Thus  the  set  (X  (j):  J  «  J)  forme  a  finite  group  of  elementary  transformations. 

It  can  also  be  verified  that 


(3.2) 


■  r AIL ■  L  -  il!L 


wp 


"«tp  piupwp  <tp  w 


“P  P"P«P 


11  M2W1, 


"*P 


Similarly,  let  k  •  (k^.-.fk  )  be  any  psrmutatlon  of  (l,...,q);  the  sat  of  all 
possible  (ql)  permutations  is  denoted  by  JC.  A  second  group  of  elementary  trans¬ 
formation  matrices  is  then  defined  by 


O.J)  <Vt>!  f  £)  5,®  '  (S>M-1 . q- 

Let  us  than  define  a  finite  group  ^  of  transformations  ^  e  J,  k  e  g) 

by 

5l(i/SP  *  *i<ifJ£>5i 

'or  W . 
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Finally,  the  group  of  all  the  (plqf)n  transformations  in  (3.4)  is  denoted  by 
i.e.f 

<3-s>  ?;-<£ . £>• 

As  before,  we  denote  the  cdf's  of  end  by  F  end  F*,  reepectlvely.  Let  now 
be  thii  class  of  all  pq-varlate  continuous  cdf's  for  which  the  pq  varlatea  ere 
interchangeable.  By  definition  (in  section  2),  F  ef  , 

LEMMA  3.1  If  F  e  &  ,  F*  is  g^inverient. 

PROOF.  On  defining  U^,  IQ)  and  I^k)  at  in  (2.4),  (3.1)  and  (3.3),  we  let 

<3'‘>  VM>  ■ 

Since  F  e  it ,  it  remains  invariant  under  row  (or  column)  permutations.  Hence, 
Ui(£,k)  has  the  same  cdf  F  for  all  J  e  Jp  k  «  K.  How,  from  (2.6),  (3.2)  and  (3.4), 
we  obtain  that 

<3’7)  -  Sp<lM*p  ■ 

-  <ip  -  *  q-6qtq> 

Thus,  the  invariance  of  the  cdf  of  (under  ^)  Implies  the  invariance  of 

the  cdf  E,(j,k)  under  E .  Hence  the  lemma. 

Let  now  2*  be  the  npq-dimensional  (Euclidean)  space  of  the  sample  point 
Z*  ••  (Z,,...,Z_).  Then  the  finite  group  (&*)  of  transformations  in  (3,4)  and 
(3.3)  maps  the  sample  space  onto  Itself,  and  under  Hq  in  (2.3),  the  distribution 
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which  la  analogous  co  the  classical  parametric  test  based  on  the  variance  ratio 
criterion  [cf.  Scheffe*  (1959)]. 

For  small  values  of  n,  p  and  q,  the  exact  permutation  distribution  of 

can  be  obtained  by  considering  the  (plql)n  (conditionally)  equally  likely  row  and 

column  permutations  of  the  matrices  **(Sua  ),  i*l,  .  ,,,n.  Ibis  procedure 

""  NKijk 

becomes  prohibitively  laborious  for  large  values  of  n,  p  or  q.  For  this  reason, 
we  consider  the  following  large  sample  approach. 

Let  ua  denote  the  marginal  cdf  of  by  and  the  joint  cdf  of 


<SV2 

ij'k*5  by  *[ jkiJ'k* 

^(x,y)  for 

all 

J> k,k»-l, 

(4.11) 

»<«)  -  £ 

P 

Z 

J-l 

(4. 12) 

“  qpTp^TT 

4 

Z 

Vr*l 

(4.13) 

H01(x,y) 

.  ■  1  ... 
pq(q-l) 

P 

Z 

J-l 

(4. 14) 

Hli(*'y>  pip-l)q(q-l) 

I 

z 

if  J' 

-t 

We  denote  by  J(u)  ■  w 

JH(u)s  0  < 

u  < 

1,  and  daflne 

(4.15) 


1 

62  *  /  Ja(u)du 
0 


!  I  J[H(x)]JCH(y>]d[H10(x,y)  +  H01(x,y)  -  Hn(x,y)]. 


Then,  proceeding  as  in  the  proof  of  theorem  4.2  of  Furl  and  Sen  (1966)  and 
omitting  the  details,  we  obtain  the  following  theorem. 
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THEOREM  4. 1  Under  the  condition  of  theorem  1  of  Chernoff  md  Savage  (1958), 


[u2( JPjj)  -  &2]  converge;  in  probability  to  sero. 

Now,  If  we  eeaume  that 

(4.16)  P([JtH(ZJk)]-J[H(ZJlk)]-J[H(ZJltt)]+J[H(Zjfk|)]]  -  Constant)  <  1, 

for  at  least  one  pair  of  J^J*  end  kjdk *,  then  as  In  theorem  4.1  of  Sen  (1966b),  It 
can  be  shown  that  5E,  defined  by  (4. 15),  is  strictly  positive.  (4. 16)  will  be 
termed,  In  the  sequel,  as  the  non -degeneracy  condition  of  the  cdf  F*.  The  main 
theorem  of  this  section  is  the  following. 


THEOREM  4.2  Under  the  conditions  of  theorem  1  of  Chernoff  and  Savage  (1958).  the 
permutation  distribution  of  converges  to  a  chi-square  distribution  with 
(p-l)(q-l)  degrees  of  freedom  (d, f,). 

PROOF,  By  virtue  of  (4.7),  (4.9)  and  (4..10),  it  suffices  to  show  that  for  any  non- 

1  p  q  ^ 

null  A  -  £  £  "jk^,  Jk  conv•r*•*  in  1-w  (und#r  Vy)  t0  *  normal 

j®  1  k*  1 

distribution  as  n— *'».  Now,  uaing  (4.4),  (4.5),  (4.6)  and  the  first  two  conditions 
of  theorem  1  of  Chernoff  and  Savage  (1958),  we  write 


(4.17) 


n 
£  Y 
1-1 


ni 


•  Vl),Y»i 


p  q  Rt.k 


1-1,. 


where  a*k's  are  linear  functions  of  a^*.  «nd  they  satisfy  the  constraints  that 
£  1  “  °>  J-l,  ...jp  and  Z  a<Jk  -  o,  W,  ...,q,  We  note  that  under  |J^,  Yni 

can  have  only  p}q>  (conditionally)  equally  likely  values  obtained  by  permuting  the 
rows  and  column,  of  the  matrix  -  <»lJk>  ^ k®l,...,q'  *nd  \l> 
are  all  atochastlcally  independent  (under  (PN)>  Thus,  it  readily  follows  that 

.  E  ('srni  (  (Pr)  "  °»  ,nd 
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1 


p  <1 


E(Ynil  «V  "  =  *  T^T(^ 


<4.  IB) 


T  p  q 

<  t  z 

(  )-l  k-1 


J  '•N+l  } 


i  q  p  i  p  q  Ri1v  ,  p  q  R.,,, 

“  S  (£  J(gif))z  -  J  MS  J<isr)2  +  “  (  z  Z  J(arr^))2 

p  k-1  j-l  H+1  q  J-l  k-l  Nfl  M  J-l  k-1  M+1 


} 


Thus,  by  routine  analyaia,  it  followa  aa  in  theorem  4.1  that 


(4.19) 


i  l  E(Y2  |  ft.)  -►  8s  £  £  (a*.  )2  >  0, 

"  i-1  ni  K  j-l  kPl  3K 


where  8e  la  defined  by  (4.  IS)  and  la  poaltlve  by  (4.16).  Further,  uaing  the 
growth  condition  of  theorem  1  of  Chevnoff  and  Savage  (1958),  It  followa  that 


(4.20) 


WlTj”*!  CP,)  <  - 


for  aome  8  >  0. 


Conaequently,  uaing  the  Berry -Eesean  theorem  (cf.  [  4  ,  p.  288]),  the  aaymptotlc 

normality  of  T  followa  from  (4.19)  and  (4.20).  Hence  the  theorem, 
n 

By  virtue  of  theorem  4.2,  an  asymptotically  else  a(0  <  a  <  1)  test  for  the 
hypothesis  of  no  Interaction  may  be  proposed  as  followa.  If 


(4.21) 


i 


-X(p-l)(q-l),a  » 


X(p-l)(q-D,a  ' 


reject  Hq  in  (2.3), 
accept  HqI 


where  XE  is  the  upper  100aX  point  of  a  chl-aquare  distribution  having  t  d. f. 

£  j  (X 

5.  Aaymptotlc  efficiency  of  the  teat  based  on  It  can  be  easily  shown  that 

the  taat  In  (4.19)  is  conaiatent  for  any  non-null  T  -  (r)..).  For  the  study  of 


the  asymptotic  efficiency  of  the  test  based  on  we  shall  therefore  consider 

the  following  sequence  of  Pltraan-al ternatives,  specified  by 


(5.1)  V  s-  TN»  n-*a,  A-  ajk), 

where  »s  are  real  and  finite  and  they  aatlafy 

p  q 

(5.2)  I  *  .  -  0,  k- 1, . . . , q ;  E  L.  ■  0  for  J-l,...,p. 

j-1  JK  k-1  JK 

Thus,  under  (H^),  the  cdf  of  Z^  (defined  by  (2.5),)  la  epaclfied  by  F*(x  -  N"^a), 

(where  x  le  a  p  x  q  matrix),  and  F*(x)  la  invariant  under  the  row  or  column  per- 
mutatlona  of  x.  Thua,  the  univariate  marginal  cdf  F^^tx)  <of  la  Indepen¬ 

dent  of  (J,k)  and  la  denoted  by  H(x)  [cf.  (4.11)].  ^Similarly,  the  bivariate 
marginal  cdf  Fj^  j^j^y)  will  be  independent  of  (J,k^k>)  and  la  denoted  by 
H01(x,y)  [cf.  (4.13)],  jk,  j tk]  wil1  b*  ind«P"nd«nt  of  (J^J'jk)  «»d  !■  denoted 
by  H10(x,y)  [cf.  (4.12)],  and  F^k, j^,] (x,y)  will  be  independent  of  ( J •  .tyki ) 
and  la  denoted  by  H^(x,y)  [cf.  (4.14)]. 

Mow,  for  arbltary  F*(jj),  the  aeymptotic  normality  of  H^(^-^N)  (where 
00 

SN  “  ^N,jk^  **N,  Jk  "  /  J-l,...,p,  kf«l,...,q)  can  be  proved 

along  tha  aame  line  aa  in  the  proof  of  theorem  3.1  of  Sen  (1967  )  (with  dlreet- 
extenaion  to  tha  matrix  caae).  We  ahall  apacifically  conalder  tha  caae  whan  (H^) 
holda.  For  thia,  we  define 


8(H)-  /  ~  J[H(x)]dH(x), 


1  1 

Az  ■  /  J2(u)du  -  [  /  J(u)du]2, 
0  0 
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(5.5)  Ptj  ■  J?  t  /  /  J[H(x)] J[H(y)]dHt,(*, y)  -  (/  J(u)du)2], 

“CO  “00  0 


for  (1,  J)  “  (0,1),  (1,0)  and  (1,1),  where  H^'s  are  defined  earlier.  Then,  by  the 
aame  technique  as  In  theorem  5.1  of  Sen  (1967),  we  obtain  that  under  (Hg), 

(5.6)  nlujgliy  ■  [(pq)'-B(H)]  a  +  o(l) 

<5.7)  „  JRI V  ■  <JP  -  ;<V ®<J,  -  ; ! jS,>A2(1-»io'»oi*fii>+'< » 

where  B(H),  A2,  P1j,t  ere  defined  in  (5.3),  (5.4)  and  (5.5).  Again,  using  (4.15), 
theoram  4. 1  and  aome  routine  computations,  It  follows  that  under  (H^l  in  (5. 1) 

0*8)  oa(  $jj,)  Aa(l-pl0-p0l+p11). 

(5.6) ,  (5.7),  (5.8)  and  tha  aaymptotic  normality  of  n^  ^  1«*d  to  the  following 

theorem. 

THEOREM  5.1  If  (1)  (1L.)  in  (5.1)  holda.  (ii)  the  condi tione  of  theorem  1  of 


hold.  jug,  defined  by  (4.10).  hae  asymptotically  a  non-central  chl-aquare  distri¬ 
bution  with  (p-D(q-l)  d.f.  and  the  non-csntralitv  parameter 


Referring  back  to  tha  modal  in  (2,1),  let  a2  be  the  variance  of  and 
Pu  be  tha  correlation  batveen  any  two  U^'s  belonging  to  the  tame  block.  Let 
*(p-l)(q-l)  (n-l)(pq-l)  b*  clt,,ice^  analyaia  variance  ratio  test  statistic 
for  testing  Hq  in  (2,3)  whan  tha  parent  cdf  la  assumed  to  be  normal.  Than,  it  can 
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be  shown  that  under  {Hjj}  In  (5.1),  -  (p-l)(q-l)  F(p.i) (q.i),  (n-l)(pq-l)  h** 

asymptotically  a  non-cantral  chi-square  distribution  with  (p-l)(q-l)  d.f.  and  the 
non -centrality  parameter 

«■“>  v  ‘m  j, 

Let  now  o^  be  the  variance  of  defined  by  (2.6).  Than,  after  some  simplifi¬ 

cations,  wa  obtain  from  (2,6)  that 

(5.11)  o®  -  [(p-l)(q-l)/pq3o®(l-pu).  * 

Conaaquantly,  from  thaoram  5.1,  (5.10)  and  (5.11),  we  arrive  at  the  following, 

THEOREM  5,2  When  the  conditions  of  theorem  5.1  hold,  the  asymptotic  relative 
efficiency  (A.R.K.)  of  the  L^-taat  with  respect  to  the  classical  analysis  of 
variance  teat  is  given  by 

<s- 12)  'ay,  iy>  ‘  [«?'<»>/*']  • 

Wa  note  that  9°  Is  the  variance  of  the  cdf  H,  and  hence  the  second  factor  on  the 
right  hand  side  of  (5.12)  resembles  the  usual  efficiency  fsctor  for  the  well-known 
Chernoff -Savage  (1958)  type  of  teat  statistics.  Also,  It  follows  from  lemmas  4.4 
and  4.5  of  Sen  (1968  )  that 

(5. 13)  P10  >  *l/(p-l),  p0l  >  -l/(q-l), 

where  the  equality  sign  holds  iff  J  ■  H"1  (apart  from  an  additive  constant).  Thus, 
from  (5.12)  and  (5.13),  we  obtain  that 
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I 


<5. 14) 


“  (i  .  Ju  [l-(p.l)(q-l)p11 


-  •  [«^w>/*8] 


This  lead.  to  the  following  corollary. 

rnynuABY  5.2.1  A  aufficl.nt  condition  for  to  be  at  laaj  B_ 

^a  [<jaB2(H)/A2]  la  that  pu<  l/(p-D<q-l>  • 

Wa  ahall  now  conaidar  two  apaclal  fc^-itatistica,  namely,  the  Normal  Scorea 

and  Hllcoxon  acora.  atatiatlca.  In  tha  flrat  caaa,  V»T)»  d*Cin*d  by  <4‘2>'  “ 
tha  axpactad  valu.  of  tha  a-th  .taallaat  oba.rvation  of  a  aampl.  of  al.a  N  from  a 
•tandard  normal  attribution,  for  I*  thin  caaa,  it  la  wall-known 

[cf.  Charnoff  and  Sava, a  (1958)]  that  a »>»(«)/ A2  i.  gra.t.r  than  or  aqu.l  to  1, 
whar.  tha  .quality  aim  hold,  only  whan  H  i.  al.o  normal.  Ihua,  tha  minimum 
A  ft.l.  of  tha  normal  acora.  t.at  with  r.apact  to  tha  cl.a.ic.l  an.ly.i.  of  vari.nc. 

taat  i.  aqu.l  to  1/(1  -  [1-<P-1>0.-1>PU1>*  >  *'  °"  th‘  ^ 

for  th.  cl...  of  parant  cdf.  for  which  pu  S  l/(p-l)(q-D,  tha  normal  acora.  t.at 
will  b.  at  lo.at  a.  afficiant  a.  th.  C^-taat.  In  particular,  !*  «(«>  1>  normal, 
pu  -  l/<p-l)(q-l)  *><  fl*B8(H)/AB  -  1,  «o  that  th.  normal  acora.  t.at  and  th. 

“t.at  bacon,  a.ymptotic.lly  power  equivalent.  For  Wllcoxon  acora.,  - 

i  s  a  £N.  In  thl.  c...,  [craBa<H>/Aa]  U  known  to  be  gra.t.r  than  or  equal 

Z  0.864  for  all  H.  Cona.qu.ntly,  th.  A.R.K.  of  th.  Wilcoxon  acora.  t.at  with 

r.apact  to  tha  0,,-taat  ia  bounded  below  by 

(5>w)  0.864/(1  -  (l-(p-l>(q-l)pxl3)  >  °'432  • 

For  normal  F,  it  ia  known  that 

(3. 16)  p10  -  %  «1«'1  <5T0ir>-  p01  ‘  r  S1"'1  °U  "  * 
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and  hence  from  (5,12),  we  obtain  that  for  normal  distributions  the  A.R.E.  of  the 
Wllcoxon-scoree  teat  with  respect  to  the  Q^-test  is  given  by 


(5  17)  — — .  ..  —  .  — rP.H — _ —  —  .  _  . . 

rtp-lXl-.H  1  ♦  i  lain'*  ♦  ain’1  ^  +  Sip’1  )  ] 


The  following  table  llluatratea  the  numerical  values  of  (5.17), 


Thus,  the  efficiency  is  bounded  below  by  3/^  and  may  be  at  high  as  0.97j£ 

Zf  we  have  more  than  one  observation  per  cell,  we  may  still  work  with  the 
aligned  observations  obtained  by  making  adjustments  for  row,  column  and  grand  means. 
The  permutation  argument  is  essentially  the  seme  (with  p  and  q  replaced  by  pr  and 
qr,  respectively,  r  being  the  number  of  observations  per  cell).  For  more  than  two 
factors,  summing  over  a  subset  of  factors,  we  may  arrive  at  the  desired  nuisance 
parameter*free  model  and  proceed  as  in  this  paper.  For  brevity,  the  details  are 
omitted, 
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ON  THE  p-RANK  OF  THE  DESIGN  MATRIX  OF  A  DIFFERENCE  SET 
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Madison,  Wisconsin 


ABSTRACT .  Let  A  be  the  Incidence  matrix  of  a  block  design  constructed 
from  a  relative  difference  set.  Let  rp  be  the  rank  mod  p  of  A  where  p  is 

a  prime.  In  this  paper  we  find  inequalities  for  rp  and  determine  rp  completely 

in  some  cases  and  in  particular  when  A  is  the  Incidence  matrix  of  the  hyper¬ 
planes  of  a  projective  or  Euclidean  geometry.  An  inequality  for  the  p-rank 
of  arbitrary  balanced  incomplete  block  designs  is  also  obtained. 

INTRODUCTION.  A  difference  set  (v,k,X,h)  in  a  group  G  of  order  v 
relative  to  a  subgroup  H  of  order  h  la  a  set  of  k  elements  g1 ,  . . . ,  g  of 
G  such  that  the  equation 

has  exactly  X  solutions  for  all  gj£H  and  no  solution  for  gcH,  gj*l.  If  h  ■  1 
then  the  set  is  called  a  difference  set  v,k,X. 

The  blocks 

",  ■  ‘«i« . *k> 

form  a  group  divisible  design  with  parameters  X^  «*  0,  X^  ■  X,  which  for 
h  ■  1  reduces  to  a  balanced  Incomplete  block  design. 

Relative  difference  sets  were  first  introduced  by  R.C.  Bose  (1942).  The 
general  definition  given  hare  is  due  to  A.T.  Butson  (1963). 

The  use  of  the  incidence  matrix  of  such  a  design,  the  design  matrix  of  the 
difference  set,  for  short,  as  a  check  matrix  of  an  error  correcting  code  using 
a  majority  rule  decoding  procedure  was  first  proposed  by  L.D.  Rudolph  in  e 
master  thesis  (1964).  These  codes  were  extensively  studied  and  practically 
implemented  by  E.J.  Weldon,  Jr.  (1966,  1967). 

In  all  these  codes  the  alphabet  consists  of  residues  mod.  p,  a  prime,  or 
more  generally  of  the  elements  of  a  finite  field  with  p»  elements,  which  we  shall 
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denote  by  G  F  (  p  )  ,  It  is  therefore  of  great  practical  importance  as  well  as  of 
theoretical  interest  to  find  the  rank  mod  p  of  such  a  design  matrix  w'mul.  we  shell 
sometimes  call  the  p-rank  of  the  difference  set. 

In  section  1  of  this  paper  we  shall  prove  a  theorem  which  for  Abelian  groups 
and  for  Ip,  v)  =1  gives  an  upper  bound  for  this  p-rank  and  which  in  certain  cases 
determines  it  completely. 

In  the  last  three  section  we  shall  determine  the  p-rank  for  the  incidence 
matrices  of  the  hyperplanes  of  EG(m,q)  and  PG(m,q)  (the  m -dimensional 
Euclidean  and  projective  geometries  over  GF(q)  which  can  in  fact  also  be 
constructed  as  design  matrices  of  difference  sets.  This  p-rank  has  previously 
been  obtained  In  special  casas  by  E.  J.  Weldon  (1967)  .and  MacWilliams  (1966). 
The  formula  proved  in  this  paper  has  however  already  been  conjectured  by  Rudolph 
(1967). 

Section  1.  Let  G  be  an  Abelian  group  and  R  the  group  ring  of  G  over  a 
field  F  ,  whose  characteristic  is  prime  to  the  order  v  of  G We  shall 
extensively  use  the  characters  x  of  G  and  ft  and  in  particular  the  relations 

<»  2*«.» 

(2)  ExU>  •fv  ta  *"*!• 

g  L  0  for  x  *  Xl  • 

If  A  a  Yj  ag9  then 

O)  ~  Zx(A)  xU’1)  =  *  • 

X 

The  notation  is  explained  and  formulas  (1) ,  (  2) ,  ( 3)  are  derived  in 
Mann  (1965  pp  73-75).  Note  however  that  we  are  here  writing  the  groups 
multiplicative!/. 
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Let 


A  =  7,a_g 
9  * 

be  an  element  of  ft.  We  associate  with  A  the  matrix  (a  _>])  .  whose  rows 

<39*  • 

and  columns  are  labeled  by  the  group  elements.  We  wish  to  find  the  rank  of 

(a  -1)  ,  which  we  shall  also  call  the  rank  of  A. 

gg* 


To  this  purpose  let 


(x(g)) 


be  a  matrix  whose  sows  are  labeled  by  the  v  characters  x  and  whose  columns 
are  labeled  by  the  v  group  elements  g  .  The  entry  in  row  x  and  column  g  is 


X(9)  • 


We  have 


(  X(a))(agg*’-1)(x(g’1))T  =  V  diag(x( A) ) 


To  prove  this  relation  we  apply  (2)  to  the  element  in  row  x  and  column  x1 


of  the  1.  h.  s.  of  (4)  and  obtain 

E  E  »ZZa,xl«>x<si*)>c'<0*'1>  '  {  0X<oth.™i.i  ‘  *' 

g  g  g  g  ^ 

This  proves  (4).  Setting  A  =  1  in  (4)  we  see  that  the  matrix  (x(g) )  is 


non  singular.  Hence  we  have 

Theorem  1.  Let  A  =  £  a^g  be  an  element  of  the  group  ring  of  an  Abelian 
group  G  of  order  v  over  a  field  F  whose  characteristic  is  prime  to  v  .  Then  the 
rank  of  A  is  equal  to  the  number  of  characters  x  of  G  such  that  x(A)*  0. 


Note  that  in  theorem  1  the  coefficients  are  in  F  and  x(A)  is  an  element 
of  F  («)  where  a  is  a  vth  root  of  unity  over  F  .  We  can  however  apply  theorem  1 
also  to  the  group  ring  ft  of  G  over  the  domain  J  of  integers.  To  this  purpose 


consider  the  field  R(ty)  where  R  Is  the  field  of  rationels  and  a  primitive  vth 
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root  of  unity  over  R  .  Let  j  be  the  domain  of  integers  of  R(t v )  .  Let 
f(x)  be  an  irreducible  factor  mod  p  of  the  cyclotomic  polynomial  of  order  v  . 

Then  since  1,  Cv,  •  •  • ,  *  is  an  integral  basis  for  J(&v)  we  know  (see 

Mann  1955  theorem  8. 1)  that  the  ideal  (f(tv)»p)  of  J(tv)  is  a  prime  ideal  divisor 
p  of  p .  Every  prime  Ideal  divisor  of  p  can  be  written  in  this  form  and 
f(l  )  ■  G(p)  .  Similarly  if  a  is  a  root  of  f(x)  over  GF(p)  then  f(*)  =  0  and 
a  is  a  primitive  vth  root  of  unity.  Moreover  the  mapping  *“*  a  is  an 
isomorphism  r  mapping  the  residues  mod  p  into  GF(p)(a)  .  Let  Rp  be  the 
group  ring  of  G  over  GF(p).  Then  the  mapping  &v  a  maps  every 
character  x  of  r  into  a  character  xp  of  Rp  in  such  a  way  that  t(x(A))  =  xp(A) 
for  every  A  *  £®g9  •  2n  particular 

X(A)  ■  0(P)  —  Xp(A)  =  0  . 

Hence  we  have 

Theorem  2.  £gt  A  *  Zaa9  be  an  element  of  the  crouprlna  of  an  Abelian 
croup  G  of  order  v  over  the  integers .  Lgt  (v,  p)  =  1  .  p-rank  of  the 
matrix  (a  -1)  is  equal  to  the  number  of  characters  x  such  that 

gg# 

x(A)  ¥  0  (p) 

where  P  is  a  fixed  prime  ideal  divisor  of  p  in  the  field  of  vth  roots  of  unity. 

For  any  set  3  in  Q  we  shall  write 

sit)  s* 

and  S  »  S(l). 

Let  D  be  a  difference  set  relative  to  the  subgroup  H  of  G  .  Then  by 
definition 

(6)  DD(-1)  ■  k  -  iH  +  XG  . 
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If  x  is  any  character  of  G  then  x  (9)  ®  x (g"J)  is  also  a  character  and 
X  =  x  if  and  only  if  x  is  of  order  i  or  2  that  is  to  say  x(g)  ■  +  1  for  all  g  ,  If 
X(D)xlD(-ll>-  x(D)x(D)  *0  then  at  most  one  of  x(D)>  X(D)  is  distinct  from  0  but  both 
are  zero  if  x(D)  =  x  (D)  •  Let  t  be  the  number  of  elements  of  order  2in  G,  t  j 
the  number  of  elements  of  order  2  in  G/'H  and  set  =  v/h  .  We  have  to  consider 
the  following  cases ; 

number  of  x  ,  number  of  x  of  order  1  or  2  x(D)  x(D) 

X(H)  a  h,  x(G)  =  v  1  1  k2 

X(H)  =  h,  X(G)  ~  0  vrl  tl  k-Xh 

X(H)«  0,  X(G)  *  0  v.Vl  t-tj  k 


From  this  we  get 


Corollary  2.1.  1st  D  be  a  v,  k,  X,  h 
and  Vj  *  v/h  .  Jet  t  be  the  number  of  elei 


,  relative  difference  get,  ^et  (p,v)  =  l 
aments  of  order  2  in  G,tj  that  in  G/H. 


Let  r  be  the  p-r^nk.  of  D  then 


v,  -  1  <  r  < 

1  —  p  — 


v-v.+l<r  < 

I  x  “  p  — 


v+^-2-t+tj 


Zv-Vj+l-tj 


r  < 

rp  -  2 


rP  a  v 


if  k  *  0  (p)  ,  k-Xh  +  0  (p), 

if  k  M  0  (p)  ,  k-\h  ■  0  (p), 

if  k  *  0  (p)  ,  k-Xh  ■  0(p) , 

if  k  4  0  (p)  ,  k-Xh*  0  (p). 


Moreover  rp  is  eq.tal  to  its  upper  bound  if  (x(D),  XD(“1)»  P)  =  1  for  all  non 
principal  x  • 

2  2 

The  last  condition  is  always  fulfilled  if  k  ^  0  (p  ) ,  k-Xh  *  0  (p  )  because  p 
has  no  multiple  factors  in  since  ( p,  v)  >1  . 
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In  the  case  of  a  difference  set  v,  k,  X  one  linuo;  II  —  X  =  G(  p) 

k  -  X  ^  0  (  p2)  then  r  *  if  k  ^  0(  p)  and  if  k  *  0{  p)  ,  (  v  must 

P  “  fa 

be  odd  otherwise  k  -  X  is  a  square)  . 

Another  difference  set  v,  k,  X  in  which  corollary  2. 1  gives  the  p-rank  for  all 

p  is  the  difference  set  D  consisting  of  the  quadratic  residues  mod  q  where,  q 

is  a  prime  and  q  ■  3(  4) .  In  this  case 

X<  D)  «  £  C  ,  X(  D)  +  i(  D)  =  -1 

n*0,r«xz(q)  q 

for  every  nonprincipal  character  x-  If  (  q  +  l)/4  ■  0(  p)  then  r  »  . 

p  2 

2  v+1 

If  p  is  odd  and  q  ■  -  v  I  P)  and  A  ■  D  +2*  we  have,  choosing  v  «1(  2) , 
AAC  -1)  -  ^[(q  +  V2  +  (Q-1+  2\)G]  . 

Hence  for  every  nonprincipal  character  x  we  have 

X(  A)  x(  AC  —1) )  *  0(  p)  . 


On  the  other  hand 

(X<  A) ,  X(  A(  -1) ) ,  p)  »(\,p)  ■  1 

and  this  means  that  one  and  only  one  of  x(  A) ,  x(  H  -1) )  is  divisible  by  a  fixed 
prime  divisor  p  of  p  .  Also 

^  0(  p)  for  v  *  1  > 

■  0(  p)  for  v  =  1 . 

Now  if  M(  A) ,  M(  D)  denote  the  matrices  of  A  and  D  respectively  we  have 


XX(A) 


M{  A)  ■  M(  D)  +  ^  I 


where  I  is  the  identity.  Hence 


if  "V  ¥  U  P) , 

p-rank  ( M<  D)  +  I)  . 

2  '  ^  if  v-KP)  . 


The  above  result  was  communicated  to  one  of  the  authors  by  A.  M.  Gleason. 
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2 

For  difference  sets  D  with  parameter  values  v,  k,  \,  where  k  -  X  s  o(  p  ) 
the  parameters  v,  k,  \  do  not  necessarily  determine  the  p  rank  of  D.  For 
instance  the  difference  set  31,15,7  constructed  from  the  hyperplanes  of  PG(  2,4) 
by  the  method  of  section  2  has  by  theorem  3  the  2-rank  6,  while  the  quadratic 
residue  difference  set  with  the  same  parameters  has  as  we  have  shown  the  2-rank 
16. 


Section  2. 

Let  q  *  pS  ,  pa  prime  and  let  v  =  qm-l .  Let  E  be  the  field 
GF(  q)  and  let  «  denote  a  primitive  vth  root  of  unity  over  E  .  Then 
ot «  GF(  qm)  and  is  a  generator  of  the  multiplicative  group  of  GF(  qm)  . 

The  minimal  polynomial  £(  x)  of  a  over  E  is  of  degree  m  .  In  fact 

(7)  f(x)  ■  "n1  (x  -  eq  )  . 

tsO 

I,  e1"”1  is  a  basis  of  GF(  qm)  over  GF(  q)  .  Hence  for  0<j<v-l 

we  have  j 

(8)  «J  =  S  ai 

1=0  3 

The  coordinates  of  the  vector 

yj  *  <  yj0 . yjmJ 

will  be  called  the  coordinates  of  .  The  set  of  these  vectors  may  be  regarded 
as  the  non-zero  points  of  a  Euclidean  geometry  EG(  m,  q)  over  E  . 


The  points  whose  coordinates  satisfy  a  non-homogeneous  linear  equation 

(9,  *»■!  -  .  V 

are  the  qm  points  of  a  hyperplane  of  EG(  m,  q) .  The  exponents  j  of  the 
corresponding  powers  of  a  form  a  difference  set  D  mod  v  relative  to  the  subgroup 
generated  by  (  qm  -  l)/(  q-1)  (Bose  1942)  , 
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Let  *  ( 0^,  8y  . . . ,  0  p  be  the  vector  defined  by 
0j  «  1  if  j  «  D, 

=  0  otherwise. 

We  shall  say  that  0.  Is  the  incidence  vector  of  a  Euclidean  hyperplane  (  briefly 

an  £.  H.  vector) .  Every  hyperplane  of  E.  G.  ( m,  q)  which  does  not  contain  the 

origin  corresponds  to  an  E.  H.  vector  and  every  incidence  vector  is  a  cyclic 

permutation  of  0,.  We  consider  the  circulant  matrix  whose  first  row  is  0.  and 

shall  determine  its  p  -  rank  in  section  4  .  (  This  matrix  is  in  fact  the  design 

matrix  of  D  of  section  1. ) 

Let  r  ■  ( qm-l)  /( q-1) .  We  have  aT  »  u«  E  and 

r-1  *"k' 

Thus  the  coordinates  of  the  points  1,  or, . . . ,  a  represent  all  the  points  of  a 
projective  geometry,  PG(  m-1,  q) . 

The  or  ,  0  <  j  <  r-1  whose  coordinates  satisfy  a  linear  homogeneous 

relation 

U0>  2  t  y  .0  ,  t  .  E 

i«0 

are  the  points  of  a  hyperplane  of  PG(m-l,  q).  The  corresponding  values  of  j 
form  a  difference  set  mod(  q— 1)/(  q-1)  .  (  Singer  1938,  also  Mann  19  65  Theorem 
6. 1)  We  wish  to  determine  the  p-rank  of  the  design  matrix  N  of  this  difference 

set. 

In  order  to  be  able  to  apply  the  same  arguments  to  the  projective  and 
Euclidean  case  we  shall  consider  ( q-1)  replications  of  the  incidence  vector  of 
the  difference  set, 

Let  P.  D.  be  the  set  of  exponents  J,  0  <  J  <  v-1  such  that  the  coordinates 

of  J  satisfy  the  equation  ( 10)  clearly  j  «  P.  D.  iff  J  +  r  <  P.  D  . 

e  *  e 

Let  &*  (  . . .  »®v-i)  the  incidence  vector  of  P.D.  defined  by 
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6*  =  1  if  J  t  P.  D., 

e*  =  o  if  j  /  p.  d. 

)|( 

We  shall  say  that  9_  is  a  P.  H.  (  projective  hyperplane)  vector.  The 
vector  9.  consists  of  q-1  replications  of  the  same  Incidence  vector  of  the 
projective  hyperplane  defined  by  ( 10) .  Every  projective  hyperplane  of  PG(  m-1,  q) 
is  represented  in  thiB  way  by  a  P.  H.  vector  and  every  P.  H.  vector  is  a  cyclic 
permutation  of  6  . 

Let  M(9_ )  be  the  circulant  matrix  with  first  row  j)  .  The  first  r  rows 
)|( 

of  M(  8,  )conslBt  of  q-1  repetitions  of  the  design  matrix  N  and  the  p- rank  of 
M(  (3  )  is  clearly  equal  to  the  p-rank  of  N  .  Hence  instead  of  the  p-rank  of 
N  we  shall  determine  the  p  -rank  of  M(6^  ). 

Section  3.  We  now  consider  again  equation  (  8)  and  form  the  matrix 

y00  * "  yv-10 
•  * 

Q  ■  .  . 

•  • 

y0m-f ' '  yv-l  m-1 

We  consider  Q  as  the  check  matrix  of  a  code  C  .  The  matrix  Q  has  rank 
m  since  the  first  m  columns  of  Q  form  a  unit  matrix,  Hence 
the  code  with  check  matrix  Q  has  rank  v  -  m  .  If  f(  x)  is  the  irreducible  poly¬ 
nomial  for  a  over  E  and 

f(  x)  s  b  +  b.x  +  . . .  +  b  xm 
o  i  m 

then  (  8)  shows  that  the  v  dimensional  vector 
{b0,bl . bm’  1  •  •  >0 ) 

and  all  its  cyclic  permutations  are  vectors  of  the  code  with  check  matrix  Q  , 

But  these  are  precisely  v-m  Independent  vectors.  Hence  f(x)  is  the  generator 
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polynomial  of  the  code  C  .  Polynomials  in  the  sense  used  here  are  residue 
classes  mod  xv-l  and  must  always  be  reduced  mod.  xv-l .  Let 

g(  x)  =  b  +  b  x  +  . . .  +  b  x01 
which  has  (g-1  as  a  root  and  let 

( 11)  ( xv-l)/ g(  x)  ■  h<  x)  -  hQ  +  ^x  +  . . .  +  bv_mxv"m. 

We  have  g(  x)  h(  x)  ■  0(  xv-l)  hence 


b  h  +  b  h  +  . . .  +  b  h  ■  0 
0  0  11  mm 

b  h  +  bh  +  . . .  b  h  ,  ■ 0 
0112  m  m+1 


b!h0  +  *  *  * 


+b  h  ,  ■  0  . 
m  m-1 


This  shows  that  the  v  dimensional  vector 
( ^g»  •  ••  i  ^v-m*  ^ 

and  all  its  cyclic  permutations  arc  orthogonal  to  the  vectors  of  C  .  Hence  the 
code  generated  by  h(  x)  is  in  the  code  orthogonal  to  C  and  sinoe  its  degree  is 
v-m  it  generates  the  whole  code  orthogonal  to  C  that  is  to  say  the  code  generated 


by  Q  . 


To  every  linear  form 


Tv, 

I  —A  *  * 


corresponds  a  v  dimensional  vector 


where 


a  ’  <  u0 . Vi> 


uj  ■  T°i  yn  • 


The  vector  u  is  in  the  code  generated  by  Q  hence 

v-1  . 

u(x)  ■  £  Uj x  ■  0(  h(x) )  . 
J-0  1 
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On  the  other  hand  if 


u{  x)  =  71  u  ,x  s  o(  h(  x) ) 
i=0  1 

then  (  uQ, . . . ,  u^)  is  in  the  code  generated  by  Q  .  This  shows  that  all 
E.  Ho  vectors 


can  be  obtained  by  setting  for  some  u(  x)  =  s(  x)  h(  x) 

6=1  if  u  ,  =  t  *  0 
j  J  m 

=  0  otherwise. 

ift 

Similarly  all  P.  H.  vectors  6,  can  be  obtained  by  setting  for  some 
u  ( x)  =s(x)  h(  x) 


0*  =  1  if  Uj  =  0 

l|( 

=  0  otherwise. 

A  moments  reflection  will  show  that 


(12)  D(x)  =  ^  6  xJ  =  xJ  -  £  (  u  -  t  )q  *xJ 

J  =0  ‘  J=0  j=0  J 

if  6  la  an  E.H. vector  and 

(13)  D(x)  *  ^  6%J  =  ^  XJ  -  2  u(  q_1,xJ 

J  =0  1  j=0  j  =0  1 

if  is  a  P.  H.  vector. 

By  theorem  1  the  rank  of  the  design  matrix  of  D  equals  the  number  of 
vth  roots  of  unity  which  are  not  roots  of  D(  x)  since  the  residue  ring  E[x]/(  xv-l) 
is  isomorphic  to  the  group  ring  of  a  cyclic  group  of  order  v  . 

For  any  polynomial  f(  x)  we  shall  say  that  (3  is  a  non  root  of  f(  x)  if 
and  only  if  p  is  a  vth  root  of  unity  and  f(  (3)  *  0.  We  will  determine  the  rank 
of  D  by  choosing  u(x)  so  that  it  will  be  possible  to  find  the  non  roots  of  D(x) 
from  formulas  ( 12)  or  ( 13)  respectively. 

Since  g(x)  =(xv-l)l'h(x)  is  prime  to  h(  x)  we  can  determine  e(  x)  so  that 
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(W) 


Y-J.  1 

e(x)  =  S  e,  x  3  Hg(x)) 

»  «  1 
i-u 

■  0  ( h(  X) )  . 

We  then  have  if  p  denotes  a  vth  root  of  unity 

( 15)  «( P)  *  1  if  g(  P)  »0 

e(p)  >0  if  g(P)  i*  0 

moreover  the  vector  e  ■  (eQle  • .  .  »ev_^)  is  a  vector  of  the  row  space  of  Q. 

Seotion  4.  We  first  prove  two  lemmas. 

Lemma  1 .  Let  g(x)  be  any  divisor  of  xv-l  and  let  h(x)  =(xv-i)/g(x) 
Let  ( v,  p)  ■  1  ,  q  ■  p*  .  Let  P  denote  generically  a  vth  root  of  unity  over 
G.F.  (q)  .  Let 

«( x)  ■  Yj  x1  «  0(  h(  x) ) 
i-0  1 

then 

( 16)  Z  a(  P)  p-i  *  v  a,  . 

g(  P)  =0 

Proof:  From  the  inversion  formula  {  3)  we  have 

Z  ®(  P)  P"1  =  v  a  . 

P  1 

The  polynomial  xv-l  has  no  double  roots  over  G.  F.  (  q)  .  Hence  h(  P)  =0  if 
and  only  if  g(  P)  *  0  .  But  a(  p)  >0  if  h(  P)  ■  o  and  this  yields  ( 16)  . 

Formula  ( 16)  is  essentially  due  to  Mattson  and  Solomon  ( 1961)  . 

Lemma  2.  Let  G  be  an  Abelian  group  of  order  v  over  a  field  F  whose 
characteristic  is  prime  to  v  .  For  each  g  let 

v  -  Z  1  x  X<  g"1)  . 

Let  A  ■  Z  a  ®  then 
g«G  0 

xU)  »ix 
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Proof:  Setting  x(  9)3  X(  9  S  we  have  on  account  of  (  2) 

X(  A )  *7,  a„  X(  g)  =  ~  Tj  Tj  *  X'(  9)  X(  9) 

g«G  v  v  g.GX'  * 

=  ~  S  iv,  Zx'x(9)  3*  . 

X'  X  9  X 

This  proves  lemma  2  . 

Corollary.  Let  P  generlcally  denote  a  vth  root  of  unity  over  a  field  F 
whose  characteristic  is  prime  to  v  .  Put 

^  =  E^P“l  ,  iB0 . v-1, 

Let  f(  x)  ■  Yj  aA  x*  then 

f(P)  3  • 

The  corollary  follows  if  we  apply  lemma  2  to  the  group  ring  of  a  cyclic 
<3  roup  of  order  v  over  a  field  F. 

In  particular  the  non  roots  of  f(x)  are  precisely  those  vth  roots  of  unity 
P  for  which  t .  #  0  . 

Lemma  1  applied  to  e  ( x)  of  ( 14)  gives 

vei  -  -ei  -  Z  P"1  3  Y  (“S'1  • 

g(  p)  »0  t=0 

From  ( 12)  and  (13)  we  get  choosing  t  3  -1  in  ( 12) 


t*0 


By  the  corollary  to  lemma  2  cr” ^  will  be  a  non-zero  of 
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(17) 


v-1 

X"'  i  j 

D{x)  =  L,  6,  x  if  and  only  if  the  coefficient  of  y  in  the  expansion  of 

i=0  1  , 

m-1  *  i 

44 \)  =  1  -  (  1  -  h  \Q  )Q 

t=0 

-j  *  i 

is  not  0  .  Similarly,  a  will  be  a  non-zero  of  D(  x)  =  x  if  and  only  if 

occurs  in  the  expansion  of 

m-i  t  . 

(18)  1  -  (  Z  \q  )q_1 

t*0 

with  a  non-zero  coefficient. 

We  shall  carry  out  the  calculations  for  the  Euclidean  case  in  detail.  The 
projective  case  can  be  treated  in  a  similar  way. 

We  may  write 

m-1  t  ,  m-1  t  ,  .  ,  ,  s-1. 

<»>  (i-2  V>"'1-<i-2  vi,(p-»(i  +  p+...+p  >  • 


trC  1=0  ”-1  «  .i 

The  exponents  occuring  in  the  multinomial  expansion  of  ( 1-  Z  \q  )P~  are  all 

t-0 

of  the  form 

*1  le 

(20)  Jxq  +  ...  +  J0  q 

where  0  <  t.  <  . . .  <t  <  m-1  and  j,  +  ...  +j  <  p-1  .  Moreover  the  coefficients 
of  these  powers  of  y  are  all  prime  to  p  .  Two  exponents  of  this  type  are  distinct 

if  t.,  . . .  ,  t  or  j . j  arc  distinct.  Moreover  every  exponent  of  type  (  20) 

occurs.  Hence  ( 17)  becomes 


44  v)  ■  i  -  Zv  1  Zci  v  ..Z°i\p  1 


Ti  +PTi  +  . . .  +ps-iTi 

,  V  12  s 

= l-lc  ...  c  V 

1  s 

where  the  3ums  arc  extended  over  all  numbers  t  of  the  form  (  20)  and  the  c^ 
are  not  0  .  Hence  we  finally  get 


44 y)  ■  Z*  YC 
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where  the  sum  is  extended  over  all  a-  >  0  of  the  form 


(23) 


and  z  #0 

V 


r=  r  2  t  PAqJ  ,  St  <  p-1  i  =  0,  .  . .  ,  s-1 

i=0  J=0  J  j  J 


Let  Q  (  p-1)  be  the  number  of  partitions  of  all  non-negative  numbers 
m 

<  p-1  into  m  non-negative  summands.  The  number  of  numbers  of  the  form  (  23) 
is  the  number  of  terms  in  22  and  is  given  by 

(Qm(p-U)3  -  1 

and  this  is  the  number  of  non-roots  of  (12)  .  A  similar  argument  shows  that  the 

*  -i 

non-roots  of  D  ( x)  in  (13)  are  given  by  the  element  1  and  by  all  a  such  that 


(24) 


s-1  rr.-l  .  , 

J  =  2  Z  Vj  p V .  2‘y  =  p-1  • 

1=0  j=0  J  j  J 


The  number  of  non  roots  of  D  (x)  in  ( 13)  is  therefore 

(  P  J  P-1  ))S  +  1 
m 

where  P  (  p-1)  is  the  number  of  partitions  of  p-1  into  m  summands, 
m 


It  is  well  known  that 


P  (t)  = 
m 


Vl>  =1  t 


m  +  t  -  1 
t 

m  +  t  \ 


\ 

m  +  t  -  1 
m-1  ! 

m  +  t  \ 
m 


Hence  we  have  (  note  that  the  projective  geometry  considered  was  ( m-1)  dimen¬ 
sional) 

Theorem  3.  The  p-rank  of  the  Incidence  matrix  of  the  hyperplanes  of  a  m-dlmen- 

g 

sional  Euclidean  or  projective  geometry  over  GF{  p  )  is_ 


m+p-1  '  8 
m 


+  c 
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where  <  =  +1  for  the  proiective  and  e  =  -1  for  the  Euclidean  geometry. 

It  o<t,<, 

then  we  put 

cq(p)  a  • 

It  is  not  difficult  to  verify  from  {  23)  that 
(25)  c^(  o-p1)  <  (q-1) 

for  all  1  and  that  (23)  represents  all  numbers  <  qm  which  satisfy  25. 

Similarly  {  24)  represents  all  J  such  that  0  <  j  <  qm  and 

(  *6)  c  (  pS)  =  q-1 

<1 

for  all  values  of  i  . 

Section  5.  A  part  of  theorem  2  can  be  generalized  to  balanced  incomplete 
block  designs.  We  shall  prove 

Theorem  4.  Let  A  be  the  incidence  matrix  of  a  balanced  incomplete 
block  design  with  parameters  v,  k,  X  and  let  n  =  k  -  X  .  Let  p  be  a  divisor  of 
n.  Then  the  p-rank  of  A  is  at  most  (v+«)/2  where  *  =  0  if  k3  0(  p)  and 
«  =  1  otherwise. 

We  have 

(27)  AAT  =  n  I  +  X  J  3  \J(p), 

where  J  is  a  v  x  v  matrix  all  of  whose  elements  are  1  . 

If  B,  C  are  square  matrices  of  order  v  over  any  field  then  (  Mac  Duffee 
(  1933) ,  chapter  I  Corollary  8.  3) 

(28)  rank  (  B)  +  rank  (  C)  .<  v  +  rank  (  BC)  . 

We  may  consider  A  as  a  matrix  over  GF(  p)  .  The  matrix  J  has  rank  1.  Hence 
the  rank  of  the  right  side  of  (27)  1st  as  defined  in  theorem  4  and  theorem  4  follows. 
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After  completion  of  this  paper  it  came  to  the  authors'  attention  that 
Theorem  3  had  already  been  obtained  for  projective  geometries  by  I.M. 
Goethals  and  P.  Delsbrt.  (On  a  class  of  Majority  logic  decodable  codes, 
forthcoming  IEEE  Trans,  on  Information  theory.)  Their  methods  are  however 
quite  different  from  those  presented  here. 
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SOME  STATISTICAL  METHODS  IN  MACHINE  INTELLIGENCE  RESEARCH 


I.  J.  Cood 

Department  of  Statistics 
Virginia  Polytechnic  Institute 
Blacksburg,  Virginia 


ABSTRACT .  About  a  dozen  examples  are  given  of  the  use  of  statistical 
methods  in  research  on  machine  intelligence,  mostly,  though  not  all, 
previously  known,  but  not  previously  brought  together.  The  topics  include 
the  application  of  rationality  to  the  research  as  a  whole;  the  trading  of 
immediate  gain  for  information;  adaptive  control  without  the  identification 
of  a  model,  by  using  smoothing  techniques;  phoneme  recognition  using 
distinctive  features  and  their  derivatives;  the  compiling  of  dictionaries; 
"botryology"  or  concept  formation  by  clump-f inding;  information  retrieval; 
medical  diagnosis;  game  playing  and  its  relationship  to  theorem  proving; 
design  of  an  alphabet  or  of  a  vocabulary;  and  artificial  neural  networks. 

Among  the  statistical  themes  that  are  emphasized  are  the  estimation  of 
probabilities;  the  use  of  amounts  of  information  and  of  evidence  as  substitutes 
for  utility  when  utility  is  difficult  to  estimate;  decision  trees;  "evolving" 
probabilities;  and  maximum,  minimum,  and  minimax  entropy  in  diagnosis.  In 
this  survey  of  methods  it  has  been  necessary  at  several  points  to  make  do 
with  references  to  the  literature. 

I.  INTRODUCTION.  This  paper  is  concerned  with  examples  of  statistical 
methods  In  machine  intelligence  research  and  is  not  much  concerned  with  non- 
statlstical  methods.  I  believe  that  some  of  the  ideas  are  new. 

One  meaning  of  "intelligence"  is  the  ability  to  adapt  to  a  wide  variety 
of  circumstances  in  the  attainment  of  some  goal  such  as  self-preservation. 

In  practice  this  will  always  involve  many  subgoals.  This  definition  Involves 
both  powers  of  perception  and  intellectual  activity.  I  think  we  have  gone 
further  in  the  mechanization  of  the  intellect  than  of  perception.  Spiders 
and  bees  seem  to  have  better  powers  of  perception  than  any  machines  to  date 
at  least  in  their  powers  of  pattern  recognition.  It  is  not  clear  whether 
perception  should  be  regarded  as  an  attribute  of  intelligence  but  I  shall 
do  so. 

The  work  on  machine  Intelligence  is  an  attempt  to  extend  the  use  of 
computers  Into  fields  where  humans  and  many  animals  are  still  supreme, 
especially  into  apparently  and  actually  non-numerical  fields,  roughly 
describable  as  "information  processing."  Elementary  information  processing 
could  be  defined  as  What  can  be  done  using  punched  cards,  sorters,  collators 
and  the  like.  Machine  intelligence  might  then  be  roughly  equated  to  advanced 
Information  processing.  Some  people  would  insist  that  the  programs  or 
machines  must  be  adaptive.  The  subject  is  still  in  its  infancy:  as  Oliver 
Self ridge  remarked  "Artificial  intelligence  remains  tainted  with  artificiality." 

One  aspect  of  intelligence  is  judgment.  You  say  that  a  person  has  used 
judgment  when  you  don't  know  how  he  arrived  at  some  opinion  (19).  This  is 
especially  true  when  one  is  talking  about  one's  own  judgment.  This  could  be 

This  article  appeared  in  Volume  19  #2  pp.  101-110  of  the  Virginia  Journal  of 
Science.  We  would  like  to  thank  the  Editors  of  this  Journal  for  permission 
to  republish  thiB  article. 
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called  the  "Elementary-my-daar-Watson"  effect.  One  approach  to  mauhluc 
intelligence  ia  to  dlacovar  how  judgments  are  made  and  then  to  simulate  them. 
Machine  intelligence  research  is  therefore  closely  related  to  experimental 
psychology.  That  la  why  there  is  a  society,  founded  in  the  U.K.,  called 
"A  is  B" ,  meaning  "Artificial  Intelligence  and  the  Simulation  of  Behaviour." 
About  a  third  of  the  members  are  experimental  psychologists. 

Some  examples  of  work  on  machine  intelligence  are: 

Machine  translation  and,  more  generally,  "Computational  Linguistics." 

Some  aspects  of  information  retrieval. 

Game  playing. 

Theorem  proving. 

Musical  composition  and  the  graphic  arts.  [See  (44),  which  book  will 
be  baaed  on  an  exhibition  organized  by  the  Institute  for  Contempo¬ 
rary  Arts.] 

Probability  estimation. 

Classification  in  ganeral. 

Included  in  classification  ia  "pattern  recognition  of  which  there  are  two  kinds 
(1)  the  recognition  that  an  already  specified  pattern  is  present  (properly 
called  "pattern  recognition"),  (il)  the  specification  of  new  patterns,  which 
la  alao  called  the  "theory  and  practice  of  clumpa"  or  "botryology from  the 
Greek  Soxpoi,  a  cluster  of  grapes.  There  are  already  27  words  beginning  with 
"botry"  In  Funk  and  Wsgnall'a  English  dictionary  so  one  more  won't  do  any 
harm.  A  good  name  la  important:  there  would  be  fewer  professors  of  history 
if  it  were  called  "what  happened." 

Examples  of  classification  are  the  recognition  of  printed  and  handwritten 
characters,  speech  recognition  Including  the  categorization  of  phonemes,  the 
classification  and  recognition  of  fingerprints,  medical  diagnosis,  and  numerical 
taxonomy.  Rutowitz  (50)  gives  a  short  survey. 

Apart  from  the  simulation  of  thought  processes,  there  has  also  been  some 
work  on  the  simulation  of  neural  networks  (for  example,  (4,  5,  11,  49)).  This 
work  ia  alao  related  to  the  theory  of  reliable  machines  made  of  unreliable 
components  (for  example,  (9,  40)),  and  borders  on  the  assembly  and  subassembly 
theories  of  mind  (24,  34,  38), 

It  la  possible  to  regard  all  statistical  methods  as  an  attempt  to  mechanize 
intelligence,  since  they  are  concerned  with  the  reduction  of  judgment  to 
calculation  aa  far  as  possible.  Perhaps  machine  intelligence  is  mainly 
concerned  with  new  kinds  of  applications  of  statistical  methods. 

An  excellent  introduction  to  machine  intelligence  research  is  (39). 

II.  EXAMPLES. 

Example  (1).  As  a  first  example  of  the  application  of  statistical 
methods,  let's  consider  the  epplication  of  the  principle  of  rationality  to 
the  work  on  machine  intelligence.  The  principle  of  rationality  is  the 
recommendation  to  maximise  expected  utility.  Let  p  be  the  probability  that 
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an  "ultraintelligent  machine"  can  be  built  for  cost  C,  where  by  definition  an 
ultraintelligent  machine  is  better  at  every  intellectual  activity  than  any 
man;  and  let  the  value  of  this  machine  if  it  can  be  built  be  u.  Then  it  is 
easily  seen  that  |p  u|>  C  for  almost  all  C,  even  if  p  is  small.  I  have  put 
the  moduli  signs  in  here  because,  although  it  is  clear  that  u  is  large  it 
is  not  clear  whether  it  is  positive  or  negative. 

Good's  Second  Law  is  that  when  getting  advice  from  consultants  on  whether 
to  undertake  some  project,  it  is  important  to  get  two  different  consultants, 
one  to  estimate  the  probability  of  success  and  the  other  to  estimate  the  value 
if  the  project  is  successful.  If  a  single  consultant  is  asked  to  judge  whether 
to  spend  an  amount  C  his  answer  Is  too  much  tied  up  with  his  own  reputation. 

If  he  thinks  p  is  not  large  he  might  advise  against  the  expenditure  in  order 
to  protect  himself,  regardless  of  the  size  of  u. 

I  think  this  elementary  point  is  important  and  often  overlooked.  It 
shows  that  a  little  rationality  can  go  a  long  way. 

A  division  of  responsibility  between  judge  and  jury  Is  familiar  in  law 
courts,  but  the  jury  la  usually  expected  to  return  a  definite  verdict  instead 
of  an  estimated  probability.  It  can  also  fail  to  reach  agreement,  of  course. 

The  term  of  imprisonment  of  a  suspect  ought  to  depend  officially  on  the 
probability  of  guilt.  Perhaps  some  day  everyone  will  have  to  pass  an  examination 
in  the  philosophy  of  probability  before  sitting  on  a  jury,  just  as  drivers  of 
cars  in  the  United  States  have  to  take  a  written  test. 

Example  (il).  The  two-armed  bandit.  This  problem  apparently  originated 
in  connection  with  the  choice  between  two  medical  treatments  (53) .  It  is 
relevant  to  adaptive  control.  Before  discussing  it  I  must  first  refer  to 
"dynamic  programming." 

When  electronic  computers  were  fairly  new,  "programming"  became  a  vogue 
word  and  therefore  the  expressions  "linear  programming",  "mathematical 
programming",  and  "dynamic  programming"  were  introduced  although  they  are 
more  logically  called  "linear  planning",  "mathematical  planning",  and  "dynamic 
(mathematical)  planning"  respectively  since  none  of  them  has  any  necessary 
connection  with  machine  programming.  Richard  Bellman,  who  originated  the 
expression  "dynamic  programming"  agrees  with  this  remark.  The  improved 
terminology  enables  one  to  speak  for  example  of  the  programming  of  dynamic 
planning. 

Dynamic  planning  is  concerned  with  decision  situations  in  which  the  current 
best  decision  cannot  be  conveniently  worked  out  without  working  back  from  the 
future.  One  has  a  decision  tree  which  is  often  stochastic  and  the  payoff  depends 
at  least  partly  on  where  one  ultimately  ends  up  on  the  tree.  For  example,  in 
the  game  of  cheeB,  the  strategy  of  the  entire  game  really  depends  on  analysis 
of  the  end  game.  This  sheds  light  on  the  appropriate  strategy  for  the  middle 
game  and  that  in  its  turn  sheds  light  on  the  opening  strategy.  Thus  dynamic 
planning  Is  in  some  respects  hundreds  of  years  old. 

A  good  example  of  the  use  of  dynamic  planning  is  for  the  two-armed  bandit 
problem  (3,  45,  46,  53,  54,  55,  56).  In  this  problem  we  have  a  gambling 
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machine  with  two  arms  or  handles,  we  put  in  a  stake  and  we  can  choose  which  of 
the  two  arms  to  pull.  Associated  with  each  arm  there  is  an  unknown  fixed 
physical  probability  that  we  shall  receive  a  certain  fixed  reward,  the  same 
reward  for  both  arms.  (There  was  an  electronic  two-armed  bandit  at  Rand 
Corporation  some  years  ago.)  The  question  is,  what  is  the  best  strategy? 

There  are  various  forms  of  this  problem  depending  on  whether  the  game  is  of 
finite  or  infinite  duration.  If  it  is  of  infinite  duration,  it  is  more 
realistic  to  discount  the  future  at  some  rate  although  the  infinite  game  has 
also  been  considered  without  a  discounting  factor.  When  the  game  is  infinite 
and  there  is  no  discounting  factor,  the  object  is  to  win  In  the  largest 
possible  fraction  of  time  in  the  long  run.  For  this  game  the  solution  is  the 
following  intuitively  obvious  one:  Since  there  will  ultimately  be  a  very 
high  probability  that  we  know  which  is  the  handle  with  the  higher  probability 
of  a  payoff,  we  should  pull  this  handle  in  a  proportion  of  cases  tending  to  one. 
The  other  handle  must  be  pulled  in  a  proportion  of  cases  tending  to  zero  but 
nevertheless  in  a  number  of  cases  tending  to  Infinity.  This  form  of  the 
problem  is  not  of  much  practical  interest,  but,  with  discounting  of  the  future, 
it  is  a  rather  good  model  of  a  typical  situation  in  which  we  have  to  decide 
whether  to  go  for  Bhort-term  gains  or  to  pay  for  additional  information.  It 
Is  easy  to  express  the  problem  of  finding  an  optimal  solution  in  terms  of  some 
mathematical  equations  which,  however,  have  never  been  solved  explicitly.  I 
have  discussed  this  problem  several  times  with  Dr.  Michie  of  Edinburgh.  About 
seven  years  ago,  he  suggested  that  the  information  should  be  measured  in  terms 
of  Fisher's  definition  of  amount  of  information  with  some  suitable  choice  of 
units,  in  order  that  the  information  could  be  interpreted  as  a  cash  value. 
However  we  refuted  this  and  we  proposed  that  expected  amount  of  information 
in  Shannon's  sense  or  else  expected  weight  of  evidence  might  be  better.*  This 
we  have  not  yet  refuted  although  in  principle  it  would  be  quite  easy  to  do  so, 
if  the  assumption  is  wrong,  by  means  of  a  computer  program.  Michie  did  write 
a  program  in  1960,  for  solving  the  dynamic  planning  equations  numerically,  but 
it  is  not  yet  quite  flexible  enough  to  deal  with  this  particular  conjecture. 

To  be  more  specific,  the  conjecture  is  that  the  long-term  financial  value  of 
an  act  is  the  sum  of  its  immediate  expected  financial  value  plus  an  amount 
proportional  to  the  expected  amount  of  information  or  to  the  expected  weight 
of  evidence.  (Compare  (36)).  The  expectations  can  be  worked  out  provided 
that  we  assume  some  initial  dlstrubutlon  for  the  physical  probabilities  p  and  q. 

The  two-armed  bandit  problem  occurs  when  one  Is  trying  to  decide  whether 
to  adopt  a  certain  medical  treatment  when  there  are  two  treatments  to  choose 
between.  The  problem  can  of  course  be  generalized  to  a  Hindu-god  bandit  having 
n  arms  or  even  a  continuous  infinity. 

The  infinite  game  with  discounting  of  the  future  is  a  simple  model  for 
the  strategy  of  scientific  research,  or  even  of  adaptive  behaviour  generally, 
and  it  is  relevant  to  certain  types  of  adaptive  control  strategy,  as  in  the 
next  example. 

In  the  application  of  the  two-armed  bandit  problem  to  the  choice  of  a 
medical  drug  we  are  unfortunately  involved  with  the  ethics  of  experimenting 


*The  amount  of  information  and  the  weight  of  evidence  concerning  H  provided  by 
E  are  defined  as  I(H:E)  -  log  [P(e|h)/P(E)]  and  W(H:E)  -  log  [P(e|h)/P(e|  not 
H)]  (see  (14,  16)  and  references  therein). 
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on  people.  It  would  be  possible  though  perhaps  Impracticable  to  draw  lots 
In  order  to  select  the  patients  to  be  given  the  treatment  currently  thought 
to  be  the  less  effective.  This  might  be  fair  and  would  satisfy  the  statis¬ 
tician's  requirement  for  randomization. 

Example  (iii) .  Adaptive  control:  a  non-identifying  approach.  (The 
pole-balancer.) 

A  classical  model  of  a  control  system  is  it  •  f(x,  u,  t) ,  where  x  is  the 

state  variable  (vector),  u_  the  control  variable,  f  a  known  function,  and  t 

is  time.  There  is  also  a  loss  function  or  loss  functional.  In  adaptive 
control,  f  is  not  usually  entirely  known,  and  u  is  chosen  either  in  the  light 

of  previous  "runs"  or  in  the  light  of  the  current  run  or  both.  Non-adaptlve 
control  is  rather  like  "dead  reckoning"  in  navigation  and  so  too  is  adaptive 
control  when  it  does  not  depend  on  the  current  run.  A  well-known  simple 
example  is  the  pole-balancing  problem  in  which  we  have  a  pole  hinged  to  the 
top  of  a  cart  which  runs  along  a  finite  straight  track,  with  a  cliff-edge 
at  each  end.  Our  objective  is  to  balance  the  pole  for  as  long  as  possible 
without  falling  over  the  edge  of  the  cliff,  to  stay  alive  aa  long  as  possible 
so  to  speak.  A  potential  application,  according  to  (10),  is  to  the  balancing 
of  a  rocket  on  its  launching  pad.  / 


l — 

cliff-edge 

< - 

— y~.. 

cliff-edge 

Suppose  we  measure  y  and  3  at  discrete  moments  of  time.  At  each  such  moment 
we  can  apply  a  "bang-bang"  control  in  which  a  constant  force  la  applied  to 
the  cart  either  to  the  left  or  the  right  at  our  choice.  We  think  of  the 
state  of  tha  system  as  a  point  in  phase-space,  with  four  coordinates  say 

“  (y,  y,  6,  3),  (Strictly  speaking,  phase-space  uses  positions  and  momenta.) 
Our  "strategy"  can  be  defined  as  a  function  from  points  x,  in  phase-space  to 
controls  u  which  take  the  values  b  and  R.  In  this  example  u  is  a  scalar. 

The  "cost"  of  our  strategy  can  be  defined  in  various  ways,  for  example  as  a 
decreasing  function  of  the  life-time  of  the  system. 

Even  in  the  theory  of  adaptive  control  it  is  usually  considered  necessary 
to  identify  the  dynamics  of  the  system  (see,  for  example,  (1C)).  But  a  juggler 
can  balance  a  stick  without  explicitly  knowing  any  dynamics,  so  it  must  be 
possible  to  do  the  same  with  a  machine.  It  might  be  expensive  of  course.  Dr. 
Donald  Michie  of  Edinburgh  proposed  dividing  the  phase-space  into  a  small 
number  (namely  5  x  5  x  3  x  3)  of  discrete  calls  or  "boxes,"  and  recording 
only  which  box  the  phase  point  is  in  at  any  moment  rather  than  its  exact 
coordinates.  Time  is  taken  as  discrete. 

Each  run  provides  Information  of  the  form 

(y.^  *  9  (x n ,  Uj)  9 .  •  • 
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where  each  ui  is  L  or  S  (lett  or  rigi.-i.) .  The  ides  ie  eo  ime  each  run  for 
learning  an  Improved  "strategy"  (see  below)  for  the  control  u  as  a  function 
of  £.  This  learning  might  take  place  between  runs,  during  runs  or  both. 

You  learn  how  to  live  as  long  as  possible  by  experience  gained  in  previous 
incarnations.  A  strategy  is  a  function  from  _£  to  u  since  we  assume  that 
only  the  positions  and  velocities  are  relevant,  i.e.  there  is  no  hysteresis 
in  the  system,  or  if  there  is  it  is  allowed  for  by  weighting  the  past 
exponentially.  [Barnard,  1959,  made  a  useful  suggestion  about  weighting 
the  past.  He  suggested  that  if  the  current  behaviour  of  a  system  changes 
by  an  unusually  large  amount,  then  the  past  should  temporarily  be  discounted 
at  an  increased  rate.] 

We  can  define  a  strategy  by  imagining  a  little  demon  in  each  box. 

A  record  is  made  by  each  demon  whose  box  has  been  used,  corresponding  to  each 
of  its  uses.,  This  record  states  whether  the  bang-bang  control  was  L  or  R  on 
each  occasion  and  also  states  the  weighted  average  of  life-times  of  the  runs, 
corresponding  separately  to  L  decisions  and  R  decisions. 

If  the  parameters  of  the  system  are  unvarying,  then  given  a  large 
enough  sample  it  would  ultimately  become  clear  to  each  little  demon  whether 
L  or  R  was  probably  the  better  decision  for  him.  Actually  he  could  never  be 
quite  certain  and  should  occasionally  make  the  apparently  less  good  choice 
merely  in  order  to  gather  information,  as  in  the  two-armed  bandit  problem. 

The  mean  life-time,  or  rather  a  decreasing  function  of  it,  ia  not 
a  very  good  coat  function.  To  use  it  is  too  much  like  trying  to  teach  some¬ 
one  (or  a  machine)  to  play  chess  by  discouraging  any  move  in  a  game  that  he 
happened  to  lose.  It  is  far  more  efficient  to  make  use  of  sub-goals  for  the 
purpose  of  choosing  positive  and  negative  reinforcements.  (Compare  (19).) 

If  scores  can  be  aaaociated  with  the  various  cells  or  boxes,  then  a  score 
can  be  associated  with  the  entire  path,  this  time  using  a  discounting  of  the 
future. 

Another  point  is  this .  If  the  dimensionality  of  the  problem  is  much 
more  than  4  (which  Is  the  number  of  dimensions  of  phase  space  in  the  pole- 
balancing  problem),  the  number  of  cells  or  boxes  is  apt  to  be  extremely  large, 
and  it  will  become  difficult  or  impracticable  to  take  a  large  enough  sample. 

In  this  case  two  different  modifications  of  Michie’s  approach  are  possible. 

(i)  Suppose  we  can  make  use  of  spacial  continuity.  Then  each  demon  can  make 
use  of  the  statistics  acquired  by  surrounding  demons,  giving  weights  that  tall 
off  according  to  the  distance  away  of  the  other  demons. 

(11)  We  can  ignore  continuity  but  treat  the  various  cells  by  some  extension 
of  a  treatment  of  multidimensional  contingency  table,  when  estimating 
probabilities  (23) . 

If  in  method  (ii)  we  were  to  categorize  the  life-times  also  into 
say  only  two  categories  (above  and  below  some  threshold  varying  with  the 
state  of  the  game  but  the  same  for  all  cells  in  any  one  run) ,  then  the  data 
would  reduce  to  a  multidimensional  contingency  table  2x2x5x5x3x3, 
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and  the  methods  of  (23)  could  be  directly  applied 
to  this  u ork  now  In  greater  detail. 


T  aViol  1  hKorof 


Example  (iv) .  Estimation  of  probabilities  in  multidimensional  contingency 
tables.  Suppose  that  a  man  is  teaching  a  machine  to  recognize  patterns  such  as 
letters  of  the  alphabet,  phonemes,  diseases,  or  fingerprints.  Far  diseases  the 
information  would  be  fed  to  the  machine  by  punching  up  cards  from  long  question¬ 
naires.  Thus  for  each  object  the  machine  has  a  list  of  attributes  and  also  the 
name  of  the  class  to  which  the  object  belongs,  as  supplied  by  the  instructor. 

I  shall  assume  that  each  attribute  is  descrete,  such  as  yes-no,  and  has  no 
natural  ordering  or,  if  it  has,  the  ordering  can  be  ignored.  This  is  true  for 
the  twelve  "distinctive  features"  of  phonemes,  due  to  Roman  Jakobson,  such  as 
voiced/unvoiced,  strident/mellow,  consonantal/nonconsonantal.  Actually  at 
least  some  of  these  features  can  be  expressed  quantitatively  and  there  are 
reasons  for  thinking  that  we  should  also  record  the  signs  of  their  derivatives 
with  respect  to  time.  This  would  increase  the  dimensionality  of  the  problem 
still  further  (29).  We  would  be  working  in  a  discretized  phase-space  of  at 
most  24  dimensions. 

Each  object  provides  one  entry  in  a  multidimensional  contingency  table. 
Owing  to  the  high  dimensionality,  the  frequency  in  moat  of  the  cells  will  be 
0  or  1.  There  is  then  a  problem  of  estimating  the  probability  of  each  cell. 

If  we  can  do  this  we  can  obtain  the  likelihoods  of  the  various  letters, 
diseases,  or  crooks,  on  any  future  occasion,  corresponding  to  any  set  of 
attributes.  For  phonemes  we  should  also  take  into  account  polyphonemlc 
statistics  and  similarly  in  medical  diagnosis  the  history  of  the  set  of  symptoms 
of  a  patient  is  relevant. 

An  approach  to  this  problem  of  estimating  probabilities  is  to  use  the 
principle  of  maximum  entropy,  that  is,  to  maximize  -  £  log  subject  to 

various  linear  constraints.  These  linear  constraints  are  obtained  by  taking 
marginal  totals  In  a  small  enough  number  of  dimensions  to  obtain  adequately 
large  frequencies.  Even  without  this,  the  principle  generates  null  hypotheses 
for  consideration.  For  example,  in  two  dimensions  it  generates  the  null 
hypothesis  of  Independence  of  rows  and  columns,  a  null  hypothesis  that  every 
statistician  would  entertain  on  grounds  of  simplicity  and  conventionality. 

For  a2x2x  ...  x  2  -  2m  table,  for  which  there  is  only  one  degree  of  freedom 
when  all  the  marginal  totals  are  known,  it  generates  the  hypothesis  that  the 
product  of  the  probabilities  on  the  black  cells  is  equal  to  the  product  on  the 
white  ones,  when  the  table  is  regarded  as  a  multidimensional  chessboard,  that 
is,  the  highest-order  interaction  vanishes.  (For  m  ■  3  this  hypothesis  was 
proposed  as  natural  by  Fisher.  (See  (2).)  The  equation  is  of  degree  2m  -  1, 
but  it  always  has  exactly  one  positive  solution. 

For  a  d^  x  d^  x  ...  x  dffl  table  with  all  rth  order  marginal  totals  given, 

the  principle  of  maximum  entropy  generates  the  null  hypothesis  that  all  rth 
order  and  higher-order  interactions  vanish.  This  is  true  for  more  than  one 
definition  of  interaction.  One  definition  is  the  discrete  multidimensional 
Fourier  transform  of  the  logarithms  of  the  probabilities,  but  Goodman  (33) 
showed  that  a  real  but  Blightly  more  complicated  definition  could  be  used 
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without  upsetting  the  results.  When  the  d's  are  all  2,  the  discrete  Fourier 
transform  is  real  and  agrees  with  the  definition  (59)  of  interact^ ons  used 
in  factorial  experiments,  as  pointed  out  in  (17).  The  result  for  any 
contingency  table  can  be  expressed  in  terms  of  all  the  "embedded  binary 
cubes."  Note  that  if  we  accept  a  null  hypothesis  we  are  in  a  position  to 
smooth  the  observations,  that  is,  to  "improve"  them. 

To  allow  for  the  finiteness  of  the  Bample  a  reasonable  procedure  is 
to  maximize  some  linear  combination  of  the  entropy  and  the  log-likelihood 
(23).  This  is  equivalent  to  selecting  the  p^'s  at  the  mode  of  the  final 

(posterior)  distribution  if  the  initial  density  is  proportional  to  n  ^Pi. 

I  think  k  -  1  is  adequate,  but  that  better  would  be  a  density  of  the  form 

/  n  p^  kPi  h*  dk  by  analogy  with  the  work  on  Bayesian  significance  tests 

for  multinomial  distributions  (31).  (This  density  is  a  function  of  the 
entropy.) 

Example  (v) .  In  work  on  mechanical  translation  it  is  necessary  to 
make  special-purpose  and  general-purpose  dictionaries.  Various  problems 
of  the  following  kind  should  then  arise:  what  is  the  coverage  of  the 
dictionary,  that  is,  what  is  the  probability  that  the  next  word  met  will 
be  one  that  is  already  in  the  dictionary?  And  what  would  be  the  coverage 
if  the  sample  on  which  the  dictionary  was  based  were  doubled?  These  questions 
can  be  answered  by  means  of  the  theory  of  the  sampling  of  species  (15,  32).  For 
example,  if  nr  is  the  number  of  distinct  words  represented  r  times  in  the 

sample  (that  is,  if  nr  is  the  frequency  of  the  frequency  r) ,  then  the  coverage 

is  approximately  1  -  n^/N  if  n^  is  large,  where  N  is  the  sample  size.  In 

fact  n^  always  is  larg?  i-  ctice,  however  large  the  sample.  The  expected 

coverage  if  the  sample  size  is  doubled  is  approximately  1  -  (n^  -  ?.n2  +  3*i3  - 

...)/N.  One  of  the  basic  ideas  in  this  theory  was  due  to  Turing  (private 
communication,  1940):  its  logic  is  extremely  similar  to  that  of  the  empirical 
Bayes  method  and  some  of  the  smoothing  techniques  of  the  species-sampling 
problem  can  be  carried  over  into  the  empirical  Bayes  method  for  other  problems. 

This  statistical  problem  does  not  of  course  go  to  the  heart  of 
mechanical  translation  but  its  solution  should  be  known  to  all  workers  in 
this  field  since  the  compilation  of  dictionaries  is  expensive  and  should  be 
organized  rationally. 

Example  (vi).  Botryology.  for  example,  in  Information  Retrieval. 

Given  computers  of  very  great  speed  and  capacity  there  are  prospects  of 
automatic  indexing  of  documents,  an  operation  that  normally  requires  rather 
high-grade  effort  and  is  expensive.  The  index  terms  do  not  need  to  be 
existing  words:  a  clump  of  related  words  can  be  regarded  as  an  index  term. 

One  point  in  making  use  of  clumps  (or  clusters)  is  to  overcome  the  difficulty 
arising  from  synonyms.  Sometimes  the  discovery  of  such  a  clump  will  suggest 
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words  and  documents,  a  variety  of  botryological  procedures  have  been 
suggested  (see,  for  example,  (21)  and  Its  references  (6,47,52,58)),  and  some 
of  them  have  been  tried  on  small  collections  of  documents,  such  as  a  few 
hundred.  Most  procedures  suggested  have  involved  a  preliminary  calculation 
of  a  relevance  or  relatedness  matrix,  at  least  resembling  a  correlation  or 
covariance  matrix  of  words  or  of  documents.  I  think  it  is  better  ((24),  pp. 
52-54;  (26),  pp,  120  and  124)  to  work  directly  with  the  document-word 
incidence  matrix,  in  order  to  cut  down  on  the  amount  of  calculation.  This 
will  be  especially  worth  while  when  dealing  with  a  sparse  incidence  matrix, 
which  is  the  usual  situation.  For  an  arbitrary  real  rectangular  matrix 
there  is  an  interative  procedure  for  obtaining  the  "singular  vectors"  which 
is  analogous  to  a  well-known  method  for  obtaining  eigenvectors  of  a  square 
symmetric  matrix.  It  can  be  used  for  component  analysis  (57).  An  elabora¬ 
tion  of  it  has  been  suggested  for  hierarchical  botryology,  together  with  a 
significance  test  (26) .  The  process  should  give  clumps  of  index  terms  and 
associated  or  conjugate  clumps  of  documents .  Similarly  if  we  have  an  incidei  ce 
matrix  of  symptoms  and  people,  we  can  look  for  clumpB  of  symptoms  and  conjugate 
clumps  of  people.  If  the  botryological  calculations  are  sucessful  we  should 
discover  new  diseases  or  complaints,  or  at  least  syndromes,  together  with  the 
people  who  suffer  from  them.  (A  syndrome  is  a  collection  of  correlated  symptoms 
whose  causal  relationship  is  often  poorly  understood.) 

Botryology  can  be  regarded  as  the  science  of  concept  formulation.  A 
concept  can  often  be  thought  of  aa  a  clump  of  previously  existing  concept*. 

Example  (vii).  Speech  recognition  without  tuition. 

Different  people  use  different  phonemes  and  this  is  a  source  of  difficulty 
for  any  speech  recognition  machine.  But  even  without  tuition  a  machine  might 
be  able  to  categorize  the  phonemes  of  a  given  speaker  botryo logically.  I  shall 
suppose  that  the  distinctive-feature  approach  is  used,  possibly  with  time 
derivatives,  so  that  each  speech  sound  will  be  represented  by  a  binary  vector 
in  m  dimensions,  where  12  m  <_  24,  A  stretch  of  speech  is  to  be  converted 
by  the  machine  into  a  sequence  of  nay  a  each  vectors .  Many  of  these  vectors 
will  represent  transitions  between  pUouemee  since  we  cannot  assume  that 
the  problem  of  segmentation  of  the  phonemes  can  be  solved  at  the  start.  The 
machine  now  has  a  binary  matrix  B  with  m  rows  and  n  columns.  This  can  be 
treated  by  a  method  which  1  call  "crude  convergence"  which  is  an  iterative 
method  of  maximizing  x’By  where  x  and  y  are  binary  vectors  (25),  p.  120). 

After  convergence  we  could  extract  the  two  quarter-matrices,  corresponding 
respectively  to  the  positive  and  to  the  negative  components  of  these  vector*, 
and  repeat  the  process.  In  this  manner  we  might  be  able  to  obtain  a  dichotomous 
dendroidal  categorization  of  the  type  shown  in  the  diagram.  A  slight  generaliza¬ 
tion  would  allow  polytomles. 


An  alternative  and  more  claaalcal  approach,  which  however  would  probably 
Involve  f*r  mere  calculation,  would  be  to  start  with  the  correlation  or 
covariance  matrix  of  the  n  original  vectors.  In  any  case  the  "transition 
phonemes"  would  mostly  be  too  rare  to  be  relevant,  and  those  that  were  not 
rare  might  deserve  to  be  called  phonemes.  To  finish  off  the  job  would  be 
a  problem  like  the  solution  of  a  simple  substitution  cipher,  but  many  of 
the  phonemes  would  be  given  only  probabilistically. 

It  would  be  Interesting  to  try  this  process  both  on  human  languages 
and  on  the  sounds  made  by  dolphins  and  whales,  which  are  linguistic  for 
all  we  know.  Of  course  with  unknown  languages  the  transformation  of  the 
speech  Into  a  sequence  of  phonemes  is  only  a  small  step  in  the  solution, 
but  a  necessary  one. 

Example  (viii).  Medical  diagnosis.  If  we  can  sqlve  the  probability 
estimation  problems  we  can  of  course  apply  Bayes'  theorem  in  order  to  do 
automatic  medical  diagnosis.  If  it  is  too  difficult  to  obtain  a  really 
good  Bayesian  model  we  can  use  a  less  good  one  and  then  Interpret  the 
resulting  Bayesian  log-factors  or  weights  of  evidence  as  orthodox  non- 
Bayeslan  statistic.  This  is  an  example  of  the  Bayes/non-Bayes  compromise. 

(See,  for  example,  (25).) 

Assuming  a  Bayesian  model  how  do  we  choose  between  two  "facets"  for  the 
eliciting  of  a  datum?  This  question,  raised  by  Card  (8),  may  be  regarded 
as  a  special  case  of  that  of  how  to  design  an  experiment.  More  generally 
wa  might  wish  to  decide  between  a  number  of  facets  and  a  number  of  treatments. 
Theoretically  we  should  use  the  principle  of  rationality.  But  utilities  are 
often  difficult  to  judge,  so  ve  might  instead  use  measures  of  information, 
evidence,  or  corroboration  as  if  they  were  utilities  (see,  for  example,  (14), 
p.  72}  (16,  20,  37)). 

The  varioue  posaibla  diseases  or  complaints  can  be  retarded  as  hypo¬ 
theses,  Hp  Hg,  H j , . . . ,  but  these  unfortunately  are  not  necessarily  mutually 

exclusive.  (The  same  complication  arises  in  chemical  analysis.)  At  any 
moment  let  us  suppose  we  have  current  probabilities  P(H^)  *  p^,  ...  ,  P(HB)«pm. 

If  ve  elicit  a  datum  E  thase  probabilities  change  to  P(H^  |e)  -  . 

F(Hb  |E)  -  q^.  A  reasonable  criterion  of  how  well  off  we  ere  In  our  diagnostic 

work  is  the  entropy  -E  q^  log  q^.  The  smaller  the  entropy  the  closer  we 

are  to  e  complete  diagnosis.  So  s  possible  criterion  for  the  selection  of 
the  facet  la  to  arrangt  to  minimise  the  expected  entropy,  assuming  of  courga 
that  tha  expected  cost  (In  time,  effort,  and  danger  to  the  patient)  is  the 
same  for  tha  varloua  alternative  selections.  (Otherwise  we  must  allow  for 
this  cost.) 

It  is  interesting  to  note  that  it  makes  sense  to  maximize  entropy  when 
estimating  probabilities,  but  to  minimize  lta  expectation  whan  planning  an 
experiment  to  obtain  revised  estimates  of  the  probabilities.  This  seams 
analogous  to  the  fact  that  tha  physical  entropy  of  isolated  systems  tends 
to  a  maximum  (the  Second  Lew  of  Thermodynamics)  whereas  in  the  evolution  of 
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living  :y:t«uo ,  wtiupy  Lunis  lu  decrease.  inxs  his  beta  caned  the  Fourth 
Law  of  Thermodynamics  (28).  Negative  entropy,  which  includes  food  and  social 
order,  can  be  regarded  as  a  physical  expression  for  utility,  at  least  as  an 
approximation.  Life  fights  a  game  against  Death  with  negentropy  as  the  prise. 

More  generally,  when  planning  an  experiment  for  which  we  Intend  to 
estimate  probabilities  by  maximizing  entropy  vc  could  try  to  minimise  the 
expected  maximum  entropy:  that  is  to  minimax  the  entropy.  It  is  as  if  we  were 
playing  a  game  against  Nature,  where  we  try  to  maximise  utility  Interpreted 
as  negentropy,  and  Nature  tries  to  minimise  it.  Mlnlmaxlng  of  expected  loss 
(l.e.  maxlmlnuing  of  expected  utility)  was  proposed  as  a  statistical  principle 
by  Abraham  Wald  and  has  bean  defended  not  as  rational  but  as  prudent  by  R.B. 
Braithwalte.  As  far  as  I  know  the  suggestion  of  mlnlmaxlng  entropy  is  new 
and,  since  It  implies  that  Death  rather  chan  Nature  is  an  opponent,  I  think 
it  makes  better  sense  than  minimax ing  expected  loss. 


The  principle  of  minimizing  (or  mlnlmaxlng)  expected  entropy  can  be 
derived  from  another  principle,  that  of  maximising  (or  maximinning)  expected 
amount  of  information.  Suppose  that  we  have  several  hypotheses  Hg,  ...  , 

Hm  (typically  H^)  and  we  wish  to  select  an  experimental  set-up  for  which  the 

possible  results  are  E^,  Eg,  ...  ,  ER  (typically  E^).  The  expected  amount 

of  information  from  the  experiment  is 

e  p(H  lE4> 

Ei,J1<Hi!BJ)  "  &i,j  loa  PO^'j 

-  £lfj  log  P  (Ht  llj)  -  log  P  (H±) , 
where  the  colon  denotes  "provided  by." 

The  second  term  does  not  depend  on  the  experimental  set-up.  So  maximising 
j  I(Ht  :  Ej)  is  equivalent  to  maximizing  -  [entropy  of  (H^,  ...,  1^) 
conditional  on  E^],  that  is,  it  is  equivalent  to  minimising  the  expected  final 
entropy  of  (H^,  . . H^)  by  appropriate  selection  of  the  experimental  sat-up 

(E j ,  ...  ,  E^) . 


Information  is  not  an  absolute  measure  of  utility  and  should  not  be  used 
if  we  have  a  better  measure.  Alternatives  are  degrees  of  corroboration  and 
especially  weight  of  evidence  (for  example,  (14,  20)).  We  might  then  t^y  to 
maximise  the  expected  weight  of  evidence  (the  vinculum  denotes  negation): 


t 


P  (E  I  h  ;  g  v  in  I  *, 

W  (H,  :  Ej)  -  ^  log  P  rff-r  -  C1#J  log  Hh'T 


Hi> 


0  (H,  |  *,) 
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The  second  term  is  again  independent  of  the  experimental  set-up.  So  the 
maximization  rf  j  W(H^  5  E^)  is  equivalent  to  minimizing  ["odds  entropy" 

of  the  hypotheses  conditional  on  the  experimental  set-up] 

-ff,  t  P  (Ht  I  Ej)  log  0  (Ht  |  Ej ) }  . 

Another  possibility  is  the  expected  logarithm  of  the  "repeat  rate" i 

Sj  lo«  IP  (Ht  |  E^)]2  . 

Information  has  the  formal  advantage  over  (weight  of)  evidence  that ,  owing 
to  an  additional  additive  property,  the  principle  of  maximizing  expected 
Information  is  consistent  when  applied  to  a  pair  of  completely  Independent 
problems.  (The  logarithm  of  tha  repeat  rate  is  also  additive.)  But  since 
neither  information  nor  evidence  is  exactly  a  utility,  thia  formal  advantage 
of  Information  over  evidence  is  not  decisive,  and  my  view  is  that  maximizing 
expected  weight  of  evidence  is  better  at  least  when  there  are  only  two 
hypothesea,  and  especially  when  the  initial  odds  are  difficult  to  estimate. 

It  breaks  down  when  the  weight  of  evidence  is  infinite,  positive  or  negative, 
but  thia  la  rare.  Even  when  bacilli  have  been  taken  from  a  patient's  blood 
and  have  satisfied  twenty  criteria  the  weight  of  evidence  is  apt  to  be  only 
of  the  order  of  6  bans  (a  Bayes  factor  of  10*),  and  anyway  (as  Dr.  Card 
raautrked  in  conversation)  tha  patient  might  really  be  only  a  carrier  of  the 
suspected  disease.  Nevertheless,  in  the  acquisition  of  evidence,  there  is 
sooner  or  later  a  law  of  diminishing  returns.  An  advantage  in  using  expected 
weight  of  evldeuce  as  a  pseudo-utility  is  that  it  is  independent  of  the 
initial  odds  of  tha  "null"  hypothesis,  which  can  often  be  judged  only  within 
a  fairly  wiae  interval.  It  la  therefore  a  relevant  m«_j«.uire  until  we  a-- 
confident  that  enough  evidence  has  already  been  acquired,  say  until  one  of 
the  diseases  is  at  least  100  to  1  on.  Similarly,  when  we  use  the  principle 
of  minimum  entropy  in  the  dssigu  of  an  experiment  and  have  difficulty  in 
ascribing  sharp  probabilitias  to  tha  hypotheses,  it  is  prudent  to  ascribe 
those  values  of  the  probabilities,  within  tha  intervals  in  which  they  are 
judged  to  lie,  in  such  a  manner  as  to  maximize  our  estimate  of  the  entropy. 

This  proposal  is  another  form  of  tha  principle  of  minlmaxing  expected  entropy, 
closely  related  to  but  not  Identical  with  the  principle  mentioned  before. 
Another  two  candidates  for  maximisation  in  the  design  of  an  experiment  are 
(writing  v  for  "or",-  for  "not",  !  for  "provided  by",  |  for  "given",  and  /for 
"as  against") : 

i?i*  P(Hi  v  V  [P<Hi  I  Hi  v  Hi<>  %  j  <WCH1/H1,  :  Ej)  Ht> 

+  P  <Ht,  |  Ht  v  (W(Hi(  /  Ht  *  |  H^}  ] 
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and 


-  I  .  W(H,  /  H.,  j  E.) 

j_  t  -si  A  j 

tEit  P  <H4  v  Hlt)  {W  (^  /  Hlt  s  Ej)  |  H£) 


■'r£i  {W  (Hlt  I  H^)  S  Ej  |  Hlf}  ] 


-  2 


P  (Ht)  [P  (Ej  |  Ht)  -  P  (Ej  |  Hlt)]  log 


P  (E1  | 

P  (Ej  |  H1()  • 


We  could  here  give  additional  weight  to  the  (1,  1')  tern  If  It  la  especially 
Important  to  distinguish  between  hypotheses  (diseases)  1  and  1'.  By  trial 
and  error  we  might  be  able  to  decide  what  measure  Is  best  to  use. 


In  the  above  discussion  I  have  Ignored  the  hierarchical  nature  of  many 
pattern  defection  or  diagnostic  processes.  These  alao  produce  statistical 
problems  associated  with  probabilistic  decoding  or  regeneration  (aesi  for 
example,  (24),  pp.  37,  38,  57,  62,  and  77). 


Let  us  think  of  medical  diagnosis  as  analogous  to  chemical  analysis. 
What  we  have  Is  a  stochastic  design  tree  as  In  the  diagram. 


A  diagnostic  decision  tree. 


Each  round  node  denotes  a  set  of  data  and  each  square  node  denotes  a  facet 
and  a  cost.  Associated  with  each  sat  of  data  Is  a  probability  vector  of 
all  possible  diagnoses.  If  one  of  these  probabilities  exceeds  sey  0.99  we 
have  won,  that  Is,  ws  have  completed  the  diagnosis.  Or  we  could  measure  the 
value  of  an  endpoint  by  the  negentropy  of  the  probability  vector,  or  by  one 
of  the  other  measure*  mentioned  above.  If  we  can  eatlmate  all  the  probabilities 
sharply,  than  our  optimal  method  of  diagnosis  would  be  performed  by  iterative 
"expactlmaxlng"  (aee  the  next  section) ,  but  if  not  then  we  could  Instead  use 
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Iterative  m*%iminning  as  In  a  game  with  an  opponent. 

Medical  and  chemical  diagnosis  are  but  two  examples  of  the  problem  of 
recognition  in  general.  We  could  clearly  set  up  a  model  of  the  general 
recognition  process  as  a  stochastic  recognition  tree,  with  iterative  expecti- 
naxing  or  ainimaxing  of  the  entropy  as  the  basis  of  the  optimal  strategy, 
while  holding  in  mind  that  expectimaxing  the  utilities  would  in  principle  be 
better  if  the  utilities  could  be  estimated,  In  some  military  applications, 
in  which  the  objects  we  wish  to  recognize  are  camouflaged,  we  might  wish  to 
maxlmln  the  expected  utilities  when  fighting  a  clever  opponent. 

Example  (ix).  Game-playing  and  theorem-proving  (see  for  example.  Good 
1968,  Newell  at  al ,  1959  where  further  references  will  be  found).  In  the 
Boral-von  Neumann  theory  of  games  a  "game  of  perfect  Information"  is  described 
as  "trivial."  But  in  normal  English  usage,  chess  is  far  from  being  a  trivial 
game,  and  this  might  seem  to  show  that,  as  far  as  chess  is  concerned,  the  von 
Neumann  theory  of  games  is  of  rather  trivial  application.  But  properly  inter¬ 
preted  it  does  hava  an  application,  because  in  practice  chess  is  a  game  Involving 
an  element  of  luck  (12).  Peraonally  1  ehould  define  a  non-trivlal  game  as 
one  that  is  so  complicated  that  its  optimal  strategy  cannot  be  definitely 
establiahad  and  whose  analysis  therefore  must  depend  on  "evolving  probabilities". 
(Sea  below.)  This  definition  could  be  used  whether  or  not  the  game  is  in 
principle  one  of  perfect  information.  In  this  sense  e  non-trivlal  gama  has 
some  analogy  with  classical  statistical  mechanics. 


Consider  an  analysis  tree  starting  with  a  position  ir0.  We  must  have 
some  rule  for  terminating  the  analysis  at  various  positions  it  which  are 
endpoints  of  ths  tree.  This  is  bscausa  the  tree  would  usually  be  too  largo 
If  every  variation  were  analysed,  although  the  number  of  possible  games  of 
chess  is  admittedly  not  more  then  io30,0'°,  if  ehe  game  is  drawn  whan  fifty 
consecutive  moves  are  played  on  each  aide  without  a  capture  or  pawn  move  (13) . 
If  we  decide  where  to  prune  the  tree  and  can  evaluate  the  "evolving"  expected 
utility  (eee  below)  of  each  end-point,  then  we  can  work  beckwerde  by  iterative 
naxlainnlng  to  ell  other  polnta  on  the  tree,  and  thus  decide  whet  move  to 
sake  in  position  tQ. 


In  order  to  save  apace  1  shell  discuss  thsorsm-provlng  at  tha  earns  time 
ea  game-playing .  In  theorem-proving,  at  any  moment  we  have  a  collection  of 
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mathematical  propositions.  The  collection  of  propositions  at  any  monant  Is 
analogous  to  a  chess  position,  and  the  transformation  rules  are  analogous  to 
the  moves  of  the  game.  If  there  is  a  particular  theorem  that  we  are  trying 
to  prove,  then  if  we  reach  a  "position"  which  includes  that  theorem  as  one  of 
its  propositions  we  have  "won  the  game."  If  there  is  nc  particular  theorem 
we  are  trying  to  prove  then  the  pay-off  can  be  measured  by  whether  we  get 
interesting  or  useful  propositions.  The  quantification  of  these  aspects  is 
of  course  far  from  being  formalized. 

In  theorem-proving  we  do  not  have  an  opponent,  so  instead  of  (Iterative) 
maximinning  we  use  "expectimaxing"  as  Michle  calls  it. 

Neither  in  chess  nor  in  theorem-proving  do  we  necessarily  have  a  tree: 
there  can  be  closures  so  that  we  are  dealing  with  a  linear  oriented  graph. 

But  even  for  a  graph  we  can  number  the  "generations"  according  to  their 
distance  from  ttq,  the  position  currently  under  discussion. 

Suppose  we  have  some  measure  for  the  turbulence  of  a  position,  which  is 
Inversely  related  to  its  quiescence.  A  quiescent  position  is  one  in  which 
there  are  no  obvious  lines  that  urgently  require  analysis.  He  must  also  be 
able  to  measure  the  superficial,  shifting,  or  evolving  probabilities  of  a 
win,  draw  or  lose  at  each  position,  for  a  game.  Thais  era  the  kind  of 
probabilities  that  change  in  the  light  of  further  thought  without  new  empirical 
information.  For  example,  the  evolving  probability  that  the  millionth  digit 
of  tt  is  a  7  la  0.1  until  we  have  completed  the  calculations.  ((14),  p.  49). 
Evolving  probabilities  are  not  strictly  consistent.  In  practical  affairs 
most  probabilities  are  of  this  kind.  (Compare  (22).)  For  theorem-proving  wa 
must  have  a  measure  of  how  close  we  are  to  the  required  theorem,  or  elee  an 
evolving  expected  utility  of  each  move. 

The  decision  of  whether  to  regard  tt  aa  an  endpoint  depends  on 

(a)  The  depth  of  ir  from  itq,  more  precisely  on  the  probabilistic  depth 
-  log  P  (ir  |  ttq ) ,  where  P  (ir  |  rrg)  is  the  probability  that  wa  shall  reach  ir 
from  Tr0 .  The  effective  depth  of  the  whole  tree  could  be  defined  aa  -I  P(rr  |  rr0) 
log  P  (tt  |  itq)  summed  over  all  end-points  of  the  tree.  This  is  an  incomplete 
entropy  since  E^P  (it  |  itq)  <  1.  The  value  of  storing  an  analysis  of  itq  perhaps 
depends  largely  on  P  (itq.)  times  the  effective  depth  of  this  analysis. 

(b)  The  turbulence  of  ir. 

(c)  The  obviousness  of  the  outcome  at  rr, 

(d)  The  size  of  the  analysis  tree  aB  a  whole.  (The  thresholds  which 
help  to  determine  the  tree  aise  need  adjustment  In  the  light  of 
a  pilot  analysis.) 

(e)  The  time  left  on  our  clock  and  on  the  opponent's  clock. 

More  precisely,  (a),  (b) ,  and  (c)  could  be  allowed  for  by  guessing 
P  (ir  |  rr0).  €  |  U  (ir  |  $)  -  0  (ir)|  , 
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where  U  (u)  is  t.ne  superficial  utility  of  tt  and  U  (tt  |  $)  is  the  utility  of 

if  4_m  ^  -«y  •  dc1  Isr  vcrth  cf  «r.-iycic 


Evaluation  of  quiescent  positions.  Every  chess  player  is  taught  the 
approximate  values  of  the  pieces  at  an  early  stage,  P-1,  B-N-3,R“5, 

Q  -  9  1/2.  These  are  not  the  only  features  of  a  position  but  they  will  serve 
as  an  example.  .1  believe  these  values  are  proportional  to  the  weights  of 
evidence  in  favor  of  winning  rather  than  losing,  a  pawn  being  worth  about 
7  decibans  in  master  chess  (19).  A  machine  can  make  use  of  a  linear  evaluation 
function  apUp  +  a^  +  ...,  where  np,  afi,...  are  the  numbers  of  pawns,  bi&hops, 

etc.,  and  the  coefficients  a_,  a  ,...  are  to  be  determined.  These  coefficients 


Can  vary  in  the  light  of  the  machine's  experience.  In  other  words  the  machine 
can  learn  optimal  values  of  the  coefficients.  (For  example,  (19).)  The  machine 
can  optimize  the  coefficients  even  without  an  instructor  by  analyzing  positions 
and  minimaxlng,  and  then  choosing  the  coefficients  so  as  to  maximize  the  correla¬ 
tion  of  the  direct  evaluation  of  positions  and  the  "analyzed  evaluation."  (51). 


If  the  various  pawns,  etc.,  are  given  separate  identities,  the  machine 
could  discover,  for  example,  that  centre  pawns  are  more  valuable  than  the 
side  pawns.  If  quadratic  terms  are  included  the  machine  can  discover  that  two 
bishops  ara  worth  more  than  bishop  and  knight.  In  other  words,  with  quadratic 
terms  the  machine  can  form  new  concepts.  Any  such  new  concept  can  be  added  to 
a  basic  list  of  concepts  as  a  single  item.  In  this  manner  there  Is  the 
possibility  of  higher-level  concepts  being  formed  in  later  experience.  It  is 
likely  to  be  too  expensive  to  use  cubic  terms  from  the  start .  A  minimal  concept 
could  be  defined  as  s  quadratic  term  in  an  evaluation  function  or  as  an  Inter¬ 
action  between  known  causative  agents. 


Before  leaving  the  discussion  of  game-playing  and  theorem-proving  I  have 
one  further  comment.  As  mentioned,  theorem-proving  involves  getting  from  one 
point  of  an  oriented  linear  graph  to  another.  But  many  of  the  steps  are 
reversible  and  it  can  pay  to  work  both  forwards  and  backwards.  In  fact  it 
can  be  proved  under  certain  assumptions  that  for  very  difficult  problems  the 
number  of  steps  raquirad  if  two-ended  working  is  used,  is  apt  to  be  about  the 
square  root  of  the  number  required  when  working  forward  only  (27). 


Example  (x).  Design  of  an  alphabet  of  letters  or  phonemes  for  a  known 
language,  and  the  choice  of  a  vocabulary  for  teaching,  or  for  the  design  of  an 
artificial  language  for  machines  or  men.  Usually  such  designs  are  arrived  at 
purely  intuitively  and  historically,  but  they  could  be  given  a  statistical 
basis,  at  least  in  part. 


These  designs  should  allow  for  the  following  things,  for  all  of  which 
statistical  data  would  be  relevant  (29): 


(1)  the  rate  of  transmission  of  Information; 

(ii)  the  cost  of  learning  the  alphabet  or  vocabulary; 

(ill)  the  cost  o*  errors  arising  out  of  confusion  of  symbols  that  are 
not  adequately  distinct  (60); 

(lv)  historical  facts  which  influence.  (1),  (ii),  and  (ill); 

(v)  generality  of  communication  for  example,  it  is  useful  if  the 
alphabets  used  for  vsrious  languages  are  the  same  or  similar; 
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(vi)  (for  ordinary  alnhahers)  fhp  r*»1s»--lnnoh-fn  <-> r  the  elphebct  t~ 
phonemes ; 

(vii)  the  cost  of  compiling  dictionaries,  and  of  making  reference  to 
them  when  the  vocabulary  is  too  large  to  be  completely  learned; 

(viii)  the  cost  of  asking  for  explanations  of  terms; 

(ix)  the  cost  of  errors  arising  from  guessing  meanings  when  dictionaries 
are  not  available  or  one  is  not  willing  to  refer  to  one,  or  one 
is  unwilling  to  ask  for  an  explanation. 

Example  (xi).  Artificial  neural  networks.  (See  for  example  (5,  24,  35, 

42,  43);  Many  further  references  will  be  found  in  these.)  An  unlimited  supply 
of  statistical  problems  can  be  generated  by  considering  artificial  neural 
networks  containing  some  random  or  pseudorandom  features,  but  I  shall  not  hava 
space  to  discuss  these.  One  example  is  the  construction  of  reliable  circuits 
using  unreliable  components.  This  is  relevant  to  an  understanding  of  the  brain 
since  real  neurons  are  unreliable  at  least  in  the  sense  that  we  lose  many 
thousands  of  them  everyday  (7) .  Another  example  of  artificial  neural  networks 
is  the  class  of  machines  called  "perceptrons."(49) . 

Then  there  is  the  assembly  theory  of  the  brain  due  to  Hebb  (34)  and 
Milner  (38)  and  the  modification  known  as  the  subassembly  theory  which  I 
have  speculated  about  (24) .  One  of  the  functions  of  the  sub-assembly  modifica¬ 
tion  is  to  aid  the  understanding  of  the  unconscious  mind  as  well  as  tha 
conscious  mind.  These  theories  are  all  intended  to  be  speculative  and  suggestive 
and  It  is  a  challenging  problem  to  formulate  them  with  enough  precision  to  be 
able  to  make  predictions  and  physical  models.  Many  of  the  problems  hare  will 
perhaps  be  too  difficult  to  solve  other  than  by  very  expensive  simulation.  In 
order  to  raise  enough  funds  for  such  work  It  might  therefore  be  necessary  to 
rely  on  inconclusive  arguments. 

Conclusions.  Machine  intelligence  research  in  a  vide  variety  of  fields 
should  make  use  of  statistical  methods  and  especially  methods  of  probability 
estimation;  the  principle  of  rationality  (maximization  of  expected  utility); 
the  use  of  amounts  of  information  and  "weights  of  evidence"  as  substitutes  for 
utility  when  utility  Is  difficult  to  estimate;  decision  trees  such  as  thosa 
occurring  in  game-playing;  "evolving  probabilities";  and  maximum,  minimum,  and 
mlnlmax  entropy. 
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